--- license: apache-2.0 base_model: danielcherubini/Qwen3.5-DeltaCoder-9B tags: - qwen3.5 - code - tool-calling - gguf - quantized - reasoning - chain-of-thought - dpo datasets: - nohurry/Opus-4.6-Reasoning-3000x-filtered - Roman1111111/claude-opus-4.6-10000x - TeichAI/claude-4.5-opus-high-reasoning-250x - Jackrong/Qwen3.5-reasoning-700x - togethercomputer/CoderForge-Preview - TIGER-Lab/AceCode-V2-122K language: - en pipeline_tag: text-generation --- # Qwen3.5-DeltaCoder-9B-GGUF > **v1.1-DPO** — Now with DPO alignment for improved code correctness and self-verification. > If you downloaded before March 28, 2026, please re-pull to get v1.1-DPO. GGUF quantizations of [Qwen3.5-DeltaCoder-9B](https://huggingface.co/danielcherubini/Qwen3.5-DeltaCoder-9B) for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines. ## What's New in v1.1-DPO - **DPO alignment** on 4,519 preference pairs from [AceCode-V2-122K](https://huggingface.co/datasets/TIGER-Lab/AceCode-V2-122K) - **Self-correcting behavior** — model now detects and fixes its own bugs rather than submitting incorrect code - **Improved code correctness** — trained to prefer passing solutions over failing ones - **Same tool-call reliability** as v1 — SFT improvements preserved through two-stage merge ## Available Quantizations | File | Quant | Size | Notes | |------|-------|------|-------| | `DeltaCoder-9B-v1.1-DPO-Q2_K.gguf` | Q2_K | ~3.6 GB | Smallest, lowest quality | | `DeltaCoder-9B-v1.1-DPO-Q3_K_S.gguf` | Q3_K_S | ~4.0 GB | | | `DeltaCoder-9B-v1.1-DPO-Q3_K_M.gguf` | Q3_K_M | ~4.4 GB | | | `DeltaCoder-9B-v1.1-DPO-Q3_K_L.gguf` | Q3_K_L | ~4.6 GB | | | `DeltaCoder-9B-v1.1-DPO-Q4_0.gguf` | Q4_0 | ~3.2 GB | | | `DeltaCoder-9B-v1.1-DPO-Q4_K_S.gguf` | Q4_K_S | ~5.0 GB | | | `DeltaCoder-9B-v1.1-DPO-Q4_K_M.gguf` | Q4_K_M | ~5.5 GB | **Recommended** | | `DeltaCoder-9B-v1.1-DPO-Q5_K_S.gguf` | Q5_K_S | ~6.1 GB | | | `DeltaCoder-9B-v1.1-DPO-Q5_0.gguf` | Q5_0 | ~6.1 GB | | | `DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf` | Q5_K_M | ~6.5 GB | | | `DeltaCoder-9B-v1.1-DPO-Q6_K.gguf` | Q6_K | ~7.3 GB | | | `DeltaCoder-9B-v1.1-DPO-Q8_0.gguf` | Q8_0 | ~9.4 GB | Near-lossless | | `DeltaCoder-9B-v1.1-DPO-BF16.gguf` | BF16 | ~17.9 GB | Full precision | ## Recommended Quant - **Low VRAM (8GB)**: Q4_K_M - **Mid VRAM (12GB)**: Q5_K_M or Q6_K - **High VRAM (16GB+)**: Q8_0 - **Full precision**: BF16 ## Training Lineage ``` Qwen/Qwen3.5-9B-Base └─ Qwen/Qwen3.5-9B (instruction tuned) └─ Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 (SFT on Claude 4.6 Opus reasoning traces) └─ danielcherubini/Qwen3.5-DeltaCoder-9B (v1 SFT — tool-call reliability) (LoRA SFT on CoderForge-Preview) └─ danielcherubini/Qwen3.5-DeltaCoder-9B v1.1-DPO ← this model (DPO on AceCode-V2-122K preference pairs) ``` ## Recommended Sampling Settings | Parameter | Value | |-----------|-------| | temperature | 0.6 | | top_k | 20 | | top_p | 0.95 | | min_p | 0.0 | | presence_penalty | 0.0 | | repeat_penalty | 1.0 | > [!WARNING] > **Do not use temperature below 0.5** — low temperatures cause deterministic looping in multi-turn agentic use. ### KV Cache Quantization | Context Length | KV Cache | VRAM (Q4_K_M) | Generation Speed | |---------------|----------|---------------|-----------------| | 102,400 | f16/q4_0 | ~8.5 GB | ~111 tok/s | | 131,072 | f16/q4_0 | ~9.1 GB | ~110 tok/s | ```bash # llama.cpp / ik_llama.cpp flags -ctk f16 -ctv q4_0 ``` ## Usage ### Ollama ```bash ollama create deltacoder -f Modelfile ``` Example `Modelfile`: ``` FROM ./DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf ``` ### llama.cpp ```bash ./llama-server -m DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf -ngl 999 -c 131072 -ctk f16 -ctv q4_0 -fa 1 --jinja ``` ### LM Studio Download any GGUF file and load it directly in LM Studio. ## Benchmarks | Model | HumanEval | HumanEval+ | Terminal-Bench Easy | |-------|-----------|------------|-------------------| | Jackrong Qwen3.5-9B-v2 (base) | 53.7% | — | — | | DeltaCoder-9B v1 (temp=0.6) | 50.6% | 49.4% | 2/4 (50%) | | **DeltaCoder-9B v1.1-DPO** (temp=0.6) | TBD | TBD | 2/4 (50%)* | *v1.1-DPO timed out on 2 tasks that v1 answered incorrectly — behavioral improvement confirmed, running with extended timeout. ## Acknowledgements - [Unsloth](https://unsloth.ai) for Qwen3.5 training support - [Together AI](https://together.ai) for the CoderForge dataset - [TIGER Lab](https://huggingface.co/TIGER-Lab) for AceCode-V2-122K - [Jackrong](https://huggingface.co/Jackrong) for the reasoning distillation - [Qwen](https://huggingface.co/Qwen) for the base model