⚠️ IMPORTANT WARNING — Model Effectiveness
These bootstrap models are part of a degeneracy chain. The ICI-DC bootstrap S1 was trained on synthetic data generated by a model (S2-coeff1.5) that itself had underfit during training. The bootstrap S2 was then fine-tuned on that degraded S1. Each generation of fine-tuning further degraded the base model's innate mutation discrimination capability (base Omni-DNA-20M achieves 0.951 AUC raw DNA; these models achieve ~0.30 AUC).
The SAD coefficient reported for the bootstrap S2 (~12.03 LR-adjusted) is a mathematical artifact of the training configuration, not an indicator of genuine training convergence.
These models are preserved for historical and reproducibility purposes only.
Omni-DNA ICI-DC Bootstrap Checkpoint
Omni-DNA-20M fine-tuned on 8,112 bootstrap synthetic mutation pairs via ICI-DC pre-training.
Training Details
- Base model: zehui127/Omni-DNA-20M (20M params, OLMo causal LM, BPE tokenizer)
- Training data: 8,112 synthetic mutation pairs generated by SAD coeff1.5 checkpoint via ICI-DC
- Best checkpoint: Step 1250 / epoch 4.92, eval loss 1.663
- Hyperparameters: LR=5e-5, epochs=10, batch_size=16, grad_accum=2, cosine schedule
Benchmark (826 test pairs)
| Axis | Metric | Score |
|---|---|---|
| Mutation Detection | F1 | 0.667 |
| Embedding Distance | Seq AUC | 0.543 |
| Masked Prediction | Surprise Δ | −1.00 |
| Discriminative | AUC | 0.306 |
Related Models
- Nhoodie/omni-dna-ici-dc — Original ICI-DC checkpoint (same training, different synthetic data)
- Nhoodie/omni-dna-sad-mutation-bootstrap — This model after SAD attenuation
- Downloads last month
- 4
Model tree for Nhoodie/omni-dna-ici-dc-bootstrap
Base model
zehui127/Omni-DNA-20M