We’ve been working on continual learning for LLM fine-tuning — training one model sequentially across
multiple domains without catastrophic forgetting. After 6 months of R&D and 50+ failed experiments (EWC,
replay, KD, gradient projection), we have a method that works.
4 independent benchmarks on Mistral-7B:
- Research benchmark (5 domains, 3 seeds) — -0.17% drift vs +43% forgetting with naive LoRA
- Walmart enterprise (4 domains) — BERTScores 0.82–0.94 across all domains retained
- Salesforce enterprise (5 domains) — Positive backward transfer: retention BERTScores improved with
each new domain (0.889 → 0.907) - Dental stress test (8 domains, 2 seeds) — Gradient norms stable throughout, zero crashes
Spectral norm locked at 1.0 across every experiment. Standard LoRA crashed at step 43 with gradient norm
263. Ours: peak under 6. No replay buffers, no EWC, no knowledge distillation.
The adapter is ~0.1% additional parameters, works with any LoRA/QLoRA setup.
Interactive benchmark dashboard with charts:
Live product (free tier, no credit card): https://mhc-finetune-saas-zrtokzlkbnue9zsk7jfgad.streamlit.app
US patent pending. Would love to hear from anyone working on continual learning or dealing with
forgetting in multi-domain fine-tuning.