CreditScope — Hybrid Risk Decision Engine

Three-layer runtime safety classifier wired into the CreditScope chat pipeline.

Layers

Layer	File	What it does
Feature	`circuit_integration.py` → `_classify_safety()`	LogReg on 4K SAE features (layer 39)
Intent	`backend/agent/risk/pattern_rules.py`	Rule-based evasion / roleplay / exploit-tail / pure-financial detectors
Response	`backend/agent/risk/response_guard.py`	Pattern banks for bypass, concealment-fraud, procedural steps, secrets, injection
Blend	`backend/agent/risk/risk_engine.py`	Weighted sum + four rule overrides → 0-1 score + categorical verdict

Verdict categories

CLEAN · BORDERLINE · SUSPICIOUS · HIGH_RISK

Wiring

circuit_integration.py::_run_fast_analysis() calls _classify_safety() (LogReg), then immediately calls compute_final_risk() to produce the hybrid verdict. The result replaces safety.verdict and safety.adversarial_probability in the response the frontend reads — no frontend changes needed.

Latency

< 1 ms overhead (pure stdlib regex + weighted arithmetic).
Model inference performance is unaffected — the hook runs post-inference on CPU.

Related repos

sarel/credit-cyber-4k-features — SAE checkpoints + trained LogReg classifier
sarel/creditscope-circuit-models — circuit tracer models

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support