---
license: apache-2.0
base_model: danielcherubini/Qwen3.5-DeltaCoder-9B
tags:
  - qwen3.5
  - code
  - tool-calling
  - gguf
  - quantized
  - reasoning
  - chain-of-thought
  - dpo
datasets:
  - nohurry/Opus-4.6-Reasoning-3000x-filtered
  - Roman1111111/claude-opus-4.6-10000x
  - TeichAI/claude-4.5-opus-high-reasoning-250x
  - Jackrong/Qwen3.5-reasoning-700x
  - togethercomputer/CoderForge-Preview
  - TIGER-Lab/AceCode-V2-122K
language:
  - en
pipeline_tag: text-generation
---

# Qwen3.5-DeltaCoder-9B-GGUF

> **v1.1-DPO** — Now with DPO alignment for improved code correctness and self-verification.
> If you downloaded before March 28, 2026, please re-pull to get v1.1-DPO.

GGUF quantizations of [Qwen3.5-DeltaCoder-9B](https://huggingface.co/danielcherubini/Qwen3.5-DeltaCoder-9B) for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines.

## What's New in v1.1-DPO

- **DPO alignment** on 4,519 preference pairs from [AceCode-V2-122K](https://huggingface.co/datasets/TIGER-Lab/AceCode-V2-122K)
- **Self-correcting behavior** — model now detects and fixes its own bugs rather than submitting incorrect code
- **Improved code correctness** — trained to prefer passing solutions over failing ones
- **Same tool-call reliability** as v1 — SFT improvements preserved through two-stage merge

## Available Quantizations

| File | Quant | Size | Notes |
|------|-------|------|-------|
| `DeltaCoder-9B-v1.1-DPO-Q2_K.gguf` | Q2_K | ~3.6 GB | Smallest, lowest quality |
| `DeltaCoder-9B-v1.1-DPO-Q3_K_S.gguf` | Q3_K_S | ~4.0 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q3_K_M.gguf` | Q3_K_M | ~4.4 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q3_K_L.gguf` | Q3_K_L | ~4.6 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q4_0.gguf` | Q4_0 | ~3.2 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q4_K_S.gguf` | Q4_K_S | ~5.0 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q4_K_M.gguf` | Q4_K_M | ~5.5 GB | **Recommended** |
| `DeltaCoder-9B-v1.1-DPO-Q5_K_S.gguf` | Q5_K_S | ~6.1 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q5_0.gguf` | Q5_0 | ~6.1 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf` | Q5_K_M | ~6.5 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q6_K.gguf` | Q6_K | ~7.3 GB | |
| `DeltaCoder-9B-v1.1-DPO-Q8_0.gguf` | Q8_0 | ~9.4 GB | Near-lossless |
| `DeltaCoder-9B-v1.1-DPO-BF16.gguf` | BF16 | ~17.9 GB | Full precision |

## Recommended Quant

- **Low VRAM (8GB)**: Q4_K_M
- **Mid VRAM (12GB)**: Q5_K_M or Q6_K
- **High VRAM (16GB+)**: Q8_0
- **Full precision**: BF16

## Training Lineage

```
Qwen/Qwen3.5-9B-Base
 └─ Qwen/Qwen3.5-9B  (instruction tuned)
     └─ Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2
         (SFT on Claude 4.6 Opus reasoning traces)
         └─ danielcherubini/Qwen3.5-DeltaCoder-9B  (v1 SFT — tool-call reliability)
             (LoRA SFT on CoderForge-Preview)
             └─ danielcherubini/Qwen3.5-DeltaCoder-9B v1.1-DPO  ← this model
                 (DPO on AceCode-V2-122K preference pairs)
```

## Recommended Sampling Settings

| Parameter | Value |
|-----------|-------|
| temperature | 0.6 |
| top_k | 20 |
| top_p | 0.95 |
| min_p | 0.0 |
| presence_penalty | 0.0 |
| repeat_penalty | 1.0 |

> [!WARNING]
> **Do not use temperature below 0.5** — low temperatures cause deterministic looping in multi-turn agentic use.

### KV Cache Quantization

| Context Length | KV Cache | VRAM (Q4_K_M) | Generation Speed |
|---------------|----------|---------------|-----------------|
| 102,400 | f16/q4_0 | ~8.5 GB | ~111 tok/s |
| 131,072 | f16/q4_0 | ~9.1 GB | ~110 tok/s |

```bash
# llama.cpp / ik_llama.cpp flags
-ctk f16 -ctv q4_0
```

## Usage

### Ollama

```bash
ollama create deltacoder -f Modelfile
```

Example `Modelfile`:
```
FROM ./DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf
```

### llama.cpp

```bash
./llama-server -m DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf -ngl 999 -c 131072 -ctk f16 -ctv q4_0 -fa 1 --jinja
```

### LM Studio

Download any GGUF file and load it directly in LM Studio.

## Benchmarks

| Model | HumanEval | HumanEval+ | Terminal-Bench Easy |
|-------|-----------|------------|-------------------|
| Jackrong Qwen3.5-9B-v2 (base) | 53.7% | — | — |
| DeltaCoder-9B v1 (temp=0.6) | 50.6% | 49.4% | 2/4 (50%) |
| **DeltaCoder-9B v1.1-DPO** (temp=0.6) | TBD | TBD | 2/4 (50%)* |

*v1.1-DPO timed out on 2 tasks that v1 answered incorrectly — behavioral improvement confirmed, running with extended timeout.

## Acknowledgements

- [Unsloth](https://unsloth.ai) for Qwen3.5 training support
- [Together AI](https://together.ai) for the CoderForge dataset
- [TIGER Lab](https://huggingface.co/TIGER-Lab) for AceCode-V2-122K
- [Jackrong](https://huggingface.co/Jackrong) for the reasoning distillation
- [Qwen](https://huggingface.co/Qwen) for the base model