pyrrho-nano-g2

pyrrho-nano-g2 is a small RAG governance co-processor for anti-hallucination pipelines. It reads a user question plus retrieved source passages and returns an evidence-state decision a RAG application can use before answering: ABSTAIN, DISPUTED, or TRUSTWORTHY.

It is not an answer generator and not an open-world fact checker. It sits between retrieval and generation, or beside a generator as a fast guardrail, to reduce cases where unsupported or contradictory retrieved evidence gets treated as safe to answer from.

Labels

Label Meaning
ABSTAIN The retrieved sources do not contain enough evidence to answer the question.
DISPUTED The retrieved sources conflict on the answer.
TRUSTWORTHY The retrieved sources consistently support answering the question.

Outputs

The raw Hugging Face model output is a three-class logits vector:

Raw field Meaning
logits[ABSTAIN] Unnormalized score for insufficient evidence.
logits[DISPUTED] Unnormalized score for conflicting evidence.
logits[TRUSTWORTHY] Unnormalized score for consistently supported evidence.

Most integrations should expose a structured decision object derived from those logits:

Field Meaning
label Final calibrated label: ABSTAIN, DISPUTED, or TRUSTWORTHY.
raw_label Highest-probability label before threshold calibration.
logits Raw score for each label, keyed by label name.
probabilities Softmax probability distribution over the three labels.
confidence Probability assigned to the final calibrated label.
trustworthy_probability P(TRUSTWORTHY), used by the calibrated decision rule.
threshold TRUSTWORTHY probability threshold used for calibrated reporting.
used_threshold_fallback Whether a low-confidence TRUSTWORTHY argmax was changed to ABSTAIN or DISPUTED.

Example normalized JSON output:

{
  "label": "DISPUTED",
  "raw_label": "DISPUTED",
  "logits": {
    "ABSTAIN": -1.42,
    "DISPUTED": 2.31,
    "TRUSTWORTHY": 0.18
  },
  "probabilities": {
    "ABSTAIN": 0.02,
    "DISPUTED": 0.86,
    "TRUSTWORTHY": 0.12
  },
  "confidence": 0.86,
  "trustworthy_probability": 0.12,
  "threshold": 0.60,
  "used_threshold_fallback": false
}

The encoder does not output generated answers, explanations, citations, source spans, retrieval results, taxonomy/category tags, route IDs, scalar diagnostics, or experimental multitask-head fields. Taxonomy/category fields that appear in evaluation reports are benchmark metadata used for breakdowns; route, taxonomy, and scalar heads are part of the experimental MoE track, not the published nano encoder output contract.

Intended Use

Use this model when a RAG system needs a fast decision about whether retrieved evidence is good enough to answer. Typical uses include pre-generation answer gating, retrieval retry or escalation triggers, abstention decisions, dispute detection, and logging evidence-quality signals for later review.

This model is not intended to write answers, verify facts outside the provided sources, localize hallucinated spans, or replace human review in high-stakes settings.

Quick Start

Transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

MODEL_ID = "yafitzdev/pyrrho-nano-g2"
LABELS = ["ABSTAIN", "DISPUTED", "TRUSTWORTHY"]
TAU = 0.50

query = "Has the company achieved profitability?"
contexts = [
    "The company posted its first profitable quarter, with net income of $4 million.",
    "The company recorded a quarterly loss of $12 million, the third consecutive losing quarter.",
]

text = "Question: " + query + "\n\nSources:\n" + "\n".join(
    f"[{i}] {context}" for i, context in enumerate(contexts, start=1)
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID).eval()

inputs = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits[0]
probs = torch.softmax(logits, dim=-1)
probs_np = probs.detach().numpy()

raw_pred = int(probs_np.argmax())
final_pred = raw_pred
used_threshold_fallback = False
if raw_pred == 2 and probs_np[2] < TAU:
    final_pred = int(probs_np[:2].argmax())
    used_threshold_fallback = True

decision = {
    "label": LABELS[final_pred],
    "raw_label": LABELS[raw_pred],
    "logits": dict(zip(LABELS, logits.detach().numpy().tolist(), strict=True)),
    "probabilities": dict(zip(LABELS, probs_np.tolist(), strict=True)),
    "confidence": float(probs_np[final_pred]),
    "trustworthy_probability": float(probs_np[2]),
    "threshold": TAU,
    "used_threshold_fallback": used_threshold_fallback,
}
print(decision)

CPU ONNX

The repository includes an INT8 ONNX export for CPU inference. Download the full repository so any external ONNX data files stay next to the .onnx file.

from pathlib import Path

from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
import numpy as np
import onnxruntime as ort

MODEL_ID = "yafitzdev/pyrrho-nano-g2"
query = "Has the company achieved profitability?"
contexts = [
    "The company posted its first profitable quarter, with net income of $4 million.",
    "The company recorded a quarterly loss of $12 million, the third consecutive losing quarter.",
]
text = "Question: " + query + "\n\nSources:\n" + "\n".join(
    f"[{i}] {context}" for i, context in enumerate(contexts, start=1)
)

model_dir = Path(snapshot_download(MODEL_ID))

tokenizer = AutoTokenizer.from_pretrained(model_dir)
session = ort.InferenceSession(
    str(model_dir / "model_quantized.onnx"),
    providers=["CPUExecutionProvider"],
)

inputs = tokenizer(text, truncation=True, max_length=4096, return_tensors="np")
logits = session.run(
    ["logits"],
    {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]},
)[0][0]
probs = np.exp(logits - logits.max())
probs = probs / probs.sum()

Calibrated Decision Rule

The reported metrics use a validation-selected threshold on P(TRUSTWORTHY). If the model's top class is TRUSTWORTHY but its probability is below the threshold, fall back to the stronger of ABSTAIN and DISPUTED.

LABELS = ["ABSTAIN", "DISPUTED", "TRUSTWORTHY"]
TAU = 0.50
probs_np = probs.detach().cpu().numpy() if hasattr(probs, "detach") else probs

raw_pred = int(probs_np.argmax())
final_pred = raw_pred
used_threshold_fallback = False
if raw_pred == 2 and probs_np[2] < TAU:
    final_pred = int(probs_np[:2].argmax())
    used_threshold_fallback = True

decision = {
    "label": LABELS[final_pred],
    "raw_label": LABELS[raw_pred],
    "probabilities": dict(zip(LABELS, probs_np.tolist(), strict=True)),
    "confidence": float(probs_np[final_pred]),
    "trustworthy_probability": float(probs_np[2]),
    "threshold": TAU,
    "used_threshold_fallback": used_threshold_fallback,
}

Results

Reported on the fitz-gov V7.0.1 held-out test split: 1,050 examples, 3 seeds. Checkpoints and TRUSTWORTHY thresholds were selected on a separate 1,050-example validation split.

Decision Recall Precision False-rate
OVERALL 95.24 ± 0.48% 95.24 ± 0.48% 4.76 ± 0.48%
ABSTAIN 95.25 ± 0.00% 96.37 ± 0.64% 1.54 ± 0.28%
DISPUTED 97.00 ± 1.17% 95.53 ± 0.71% 2.22 ± 0.36%
TRUSTWORTHY 93.66 ± 0.30% 94.06 ± 0.66% 3.48 ± 0.40%

For OVERALL, recall and precision are micro-averages; in single-label three-class classification they both equal accuracy. For label rows, false-rate is the share of cases that were not that label but were incorrectly predicted as that label. The TRUSTWORTHY false-rate is the main safety metric: it measures cases where the model says TRUSTWORTHY even though the sources do not support that decision.

F1 is not shown in the headline table. It is the harmonic mean of precision and recall (2 * precision * recall / (precision + recall)), useful as a compact balance score but less direct than the operating metrics above.

Per-seed held-out test results:

Seed TRUSTWORTHY threshold Accuracy False-trustworthy rate
42 0.57 95.71% 3.03%
1337 0.56 94.76% 3.78%
7 0.34 95.24% 3.63%

Training Data

Trained and evaluated on fitz-gov V7.0.1, an English benchmark of 10,500 RAG evidence-governance examples with query-grouped train, validation, and test splits. Split sizes: 8,400 train, 1,050 validation, 1,050 held-out test.

The validation split was used for checkpoint and threshold selection. The held-out test split, when present, was used only for final reporting.

Training Recipe

Item Value
Base model answerdotai/ModernBERT-base
Architecture Encoder with sequence-classification head
Max sequence length 4096 tokens
Labels ABSTAIN, DISPUTED, TRUSTWORTHY
Epochs 5 with early stopping
Batch size 16
Learning rate 5e-5
Scheduler Cosine with 10% warmup
Weight decay 0.01
Loss Weighted cross-entropy with label smoothing
Class weights [2.3, 2.3, 1.0]
Label smoothing 0.15
Selection metric Accuracy with an explicit penalty for false-trustworthy errors
Seeds 42, 1337, 7

Limitations

  • English-only training and evaluation data.
  • The model judges only the provided sources; unsupported retrieval input can still lead to abstention or dispute decisions that require application-level handling.
  • Small per-category slices should not be treated as standalone product guarantees; use aggregate metrics and run domain-specific checks for deployment.
  • The decision threshold is tuned for low false-trustworthy rate, so some answerable cases may be classified as ABSTAIN or DISPUTED.

Citation

@misc{pyrrho_nano_g2_2026,
  title  = {pyrrho-nano-g2},
  author = {Yan Fitzner},
  year   = {2026},
  url    = {https://huggingface.co/yafitzdev/pyrrho-nano-g2},
}

License

CC BY-NC 4.0. Free for research, evaluation, and personal use; commercial use requires a separate license.

Downloads last month
38
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yafitzdev/pyrrho-nano-g2

Quantized
(28)
this model

Dataset used to train yafitzdev/pyrrho-nano-g2

Collection including yafitzdev/pyrrho-nano-g2