CreditScope β Hybrid Risk Decision Engine
Three-layer runtime safety classifier wired into the CreditScope chat pipeline.
Layers
| Layer | File | What it does |
|---|---|---|
| Feature | circuit_integration.py β _classify_safety() |
LogReg on 4K SAE features (layer 39) |
| Intent | backend/agent/risk/pattern_rules.py |
Rule-based evasion / roleplay / exploit-tail / pure-financial detectors |
| Response | backend/agent/risk/response_guard.py |
Pattern banks for bypass, concealment-fraud, procedural steps, secrets, injection |
| Blend | backend/agent/risk/risk_engine.py |
Weighted sum + four rule overrides β 0-1 score + categorical verdict |
Verdict categories
CLEAN Β· BORDERLINE Β· SUSPICIOUS Β· HIGH_RISK
Wiring
circuit_integration.py::_run_fast_analysis() calls _classify_safety() (LogReg),
then immediately calls compute_final_risk() to produce the hybrid verdict.
The result replaces safety.verdict and safety.adversarial_probability in the
response the frontend reads β no frontend changes needed.
Latency
< 1 ms overhead (pure stdlib regex + weighted arithmetic).
Model inference performance is unaffected β the hook runs post-inference on CPU.
Related repos
sarel/credit-cyber-4k-featuresβ SAE checkpoints + trained LogReg classifiersarel/creditscope-circuit-modelsβ circuit tracer models
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support