pyrrho-nano-g2

pyrrho-nano-g2 is a small RAG governance co-processor for anti-hallucination pipelines. It reads a user question plus retrieved source passages and returns an evidence-state decision a RAG application can use before answering: ABSTAIN, DISPUTED, or TRUSTWORTHY.

It is not an answer generator and not an open-world fact checker. It sits between retrieval and generation, or beside a generator as a fast guardrail, to reduce cases where unsupported or contradictory retrieved evidence gets treated as safe to answer from.

Labels

Label	Meaning
`ABSTAIN`	The retrieved sources do not contain enough evidence to answer the question.
`DISPUTED`	The retrieved sources conflict on the answer.
`TRUSTWORTHY`	The retrieved sources consistently support answering the question.

Outputs

The raw Hugging Face model output is a three-class logits vector:

Raw field	Meaning
`logits[ABSTAIN]`	Unnormalized score for insufficient evidence.
`logits[DISPUTED]`	Unnormalized score for conflicting evidence.
`logits[TRUSTWORTHY]`	Unnormalized score for consistently supported evidence.

Most integrations should expose a structured decision object derived from those logits:

Field	Meaning
`label`	Final calibrated label: `ABSTAIN`, `DISPUTED`, or `TRUSTWORTHY`.
`raw_label`	Highest-probability label before threshold calibration.
`logits`	Raw score for each label, keyed by label name.
`probabilities`	Softmax probability distribution over the three labels.
`confidence`	Probability assigned to the final calibrated label.
`trustworthy_probability`	`P(TRUSTWORTHY)`, used by the calibrated decision rule.
`threshold`	TRUSTWORTHY probability threshold used for calibrated reporting.
`used_threshold_fallback`	Whether a low-confidence `TRUSTWORTHY` argmax was changed to `ABSTAIN` or `DISPUTED`.

Example normalized JSON output:

{
  "label": "DISPUTED",
  "raw_label": "DISPUTED",
  "logits": {
    "ABSTAIN": -1.42,
    "DISPUTED": 2.31,
    "TRUSTWORTHY": 0.18
  },
  "probabilities": {
    "ABSTAIN": 0.02,
    "DISPUTED": 0.86,
    "TRUSTWORTHY": 0.12
  },
  "confidence": 0.86,
  "trustworthy_probability": 0.12,
  "threshold": 0.60,
  "used_threshold_fallback": false
}

The encoder does not output generated answers, explanations, citations, source spans, retrieval results, taxonomy/category tags, route IDs, scalar diagnostics, or experimental multitask-head fields. Taxonomy/category fields that appear in evaluation reports are benchmark metadata used for breakdowns; route, taxonomy, and scalar heads are part of the experimental MoE track, not the published nano encoder output contract.

Intended Use

Use this model when a RAG system needs a fast decision about whether retrieved evidence is good enough to answer. Typical uses include pre-generation answer gating, retrieval retry or escalation triggers, abstention decisions, dispute detection, and logging evidence-quality signals for later review.

This model is not intended to write answers, verify facts outside the provided sources, localize hallucinated spans, or replace human review in high-stakes settings.

Quick Start

Transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

MODEL_ID = "yafitzdev/pyrrho-nano-g2"
LABELS = ["ABSTAIN", "DISPUTED", "TRUSTWORTHY"]
TAU = 0.50

query = "Has the company achieved profitability?"
contexts = [
    "The company posted its first profitable quarter, with net income of $4 million.",
    "The company recorded a quarterly loss of $12 million, the third consecutive losing quarter.",
]

text = "Question: " + query + "\n\nSources:\n" + "\n".join(
    f"[{i}] {context}" for i, context in enumerate(contexts, start=1)
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID).eval()

inputs = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits[0]
probs = torch.softmax(logits, dim=-1)
probs_np = probs.detach().numpy()

raw_pred = int(probs_np.argmax())
final_pred = raw_pred
used_threshold_fallback = False
if raw_pred == 2 and probs_np[2] < TAU:
    final_pred = int(probs_np[:2].argmax())
    used_threshold_fallback = True

decision = {
    "label": LABELS[final_pred],
    "raw_label": LABELS[raw_pred],
    "logits": dict(zip(LABELS, logits.detach().numpy().tolist(), strict=True)),
    "probabilities": dict(zip(LABELS, probs_np.tolist(), strict=True)),
    "confidence": float(probs_np[final_pred]),
    "trustworthy_probability": float(probs_np[2]),
    "threshold": TAU,
    "used_threshold_fallback": used_threshold_fallback,
}
print(decision)

CPU ONNX

The repository includes an INT8 ONNX export for CPU inference. Download the full repository so any external ONNX data files stay next to the .onnx file.

from pathlib import Path

from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
import numpy as np
import onnxruntime as ort

MODEL_ID = "yafitzdev/pyrrho-nano-g2"
query = "Has the company achieved profitability?"
contexts = [
    "The company posted its first profitable quarter, with net income of $4 million.",
    "The company recorded a quarterly loss of $12 million, the third consecutive losing quarter.",
]
text = "Question: " + query + "\n\nSources:\n" + "\n".join(
    f"[{i}] {context}" for i, context in enumerate(contexts, start=1)
)

model_dir = Path(snapshot_download(MODEL_ID))

tokenizer = AutoTokenizer.from_pretrained(model_dir)
session = ort.InferenceSession(
    str(model_dir / "model_quantized.onnx"),
    providers=["CPUExecutionProvider"],
)

inputs = tokenizer(text, truncation=True, max_length=4096, return_tensors="np")
logits = session.run(
    ["logits"],
    {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]},
)[0][0]
probs = np.exp(logits - logits.max())
probs = probs / probs.sum()

Calibrated Decision Rule

The reported metrics use a validation-selected threshold on P(TRUSTWORTHY). If the model's top class is TRUSTWORTHY but its probability is below the threshold, fall back to the stronger of ABSTAIN and DISPUTED.

LABELS = ["ABSTAIN", "DISPUTED", "TRUSTWORTHY"]
TAU = 0.50
probs_np = probs.detach().cpu().numpy() if hasattr(probs, "detach") else probs

raw_pred = int(probs_np.argmax())
final_pred = raw_pred
used_threshold_fallback = False
if raw_pred == 2 and probs_np[2] < TAU:
    final_pred = int(probs_np[:2].argmax())
    used_threshold_fallback = True

decision = {
    "label": LABELS[final_pred],
    "raw_label": LABELS[raw_pred],
    "probabilities": dict(zip(LABELS, probs_np.tolist(), strict=True)),
    "confidence": float(probs_np[final_pred]),
    "trustworthy_probability": float(probs_np[2]),
    "threshold": TAU,
    "used_threshold_fallback": used_threshold_fallback,
}

Results

Reported on the fitz-gov V7.0.1 held-out test split: 1,050 examples, 3 seeds. Checkpoints and TRUSTWORTHY thresholds were selected on a separate 1,050-example validation split.

Decision	Recall	Precision	False-rate
`OVERALL`	95.24 ± 0.48%	95.24 ± 0.48%	4.76 ± 0.48%
`ABSTAIN`	95.25 ± 0.00%	96.37 ± 0.64%	1.54 ± 0.28%
`DISPUTED`	97.00 ± 1.17%	95.53 ± 0.71%	2.22 ± 0.36%
`TRUSTWORTHY`	93.66 ± 0.30%	94.06 ± 0.66%	3.48 ± 0.40%

For OVERALL, recall and precision are micro-averages; in single-label three-class classification they both equal accuracy. For label rows, false-rate is the share of cases that were not that label but were incorrectly predicted as that label. The TRUSTWORTHY false-rate is the main safety metric: it measures cases where the model says TRUSTWORTHY even though the sources do not support that decision.

F1 is not shown in the headline table. It is the harmonic mean of precision and recall (2 * precision * recall / (precision + recall)), useful as a compact balance score but less direct than the operating metrics above.

Per-seed held-out test results:

Seed	TRUSTWORTHY threshold	Accuracy	False-trustworthy rate
42	0.57	95.71%	3.03%
1337	0.56	94.76%	3.78%
7	0.34	95.24%	3.63%

Training Data

Trained and evaluated on fitz-gov V7.0.1, an English benchmark of 10,500 RAG evidence-governance examples with query-grouped train, validation, and test splits. Split sizes: 8,400 train, 1,050 validation, 1,050 held-out test.

The validation split was used for checkpoint and threshold selection. The held-out test split, when present, was used only for final reporting.

Training Recipe

Item	Value
Base model	`answerdotai/ModernBERT-base`
Architecture	Encoder with sequence-classification head
Max sequence length	4096 tokens
Labels	`ABSTAIN`, `DISPUTED`, `TRUSTWORTHY`
Epochs	5 with early stopping
Batch size	16
Learning rate	5e-5
Scheduler	Cosine with 10% warmup
Weight decay	0.01
Loss	Weighted cross-entropy with label smoothing
Class weights	`[2.3, 2.3, 1.0]`
Label smoothing	0.15
Selection metric	Accuracy with an explicit penalty for false-trustworthy errors
Seeds	42, 1337, 7

Limitations

English-only training and evaluation data.
The model judges only the provided sources; unsupported retrieval input can still lead to abstention or dispute decisions that require application-level handling.
Small per-category slices should not be treated as standalone product guarantees; use aggregate metrics and run domain-specific checks for deployment.
The decision threshold is tuned for low false-trustworthy rate, so some answerable cases may be classified as ABSTAIN or DISPUTED.

Citation

@misc{pyrrho_nano_g2_2026,
  title  = {pyrrho-nano-g2},
  author = {Yan Fitzner},
  year   = {2026},
  url    = {https://huggingface.co/yafitzdev/pyrrho-nano-g2},
}

License

CC BY-NC 4.0. Free for research, evaluation, and personal use; commercial use requires a separate license.

Downloads last month: 38

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for yafitzdev/pyrrho-nano-g2

Base model

answerdotai/ModernBERT-base

Quantized

(28)

this model

Dataset used to train yafitzdev/pyrrho-nano-g2

Collection including yafitzdev/pyrrho-nano-g2

pyrrho

Collection

3 items • Updated about 1 hour ago • 1