raspbfox
/

tp906-mtp

+---
+license: apache-2.0
+tags:
+  - qwen3.5
+  - mamba2
+  - mtp
+  - speculative-decoding
+  - tp906
+---
+# Qwen3.5 MTP Weights for tp906
+Pre-extracted **Multi-Token Prediction (MTP)** sidecar weights for [Qwen3.5](https://huggingface.co/Qwen) Mamba2-hybrid models.
+Used by [tp906-engine](https://hub.docker.com/r/skyne98/tp906-engine) for speculative decoding (~10-15% decode speedup).
+## Files
+| Model | File | Size |
+|-------|------|------|
+| Qwen3.5-0.8B | `Qwen3.5-0.8B/mtp_weights.bin` | 39 MB |
+| Qwen3.5-2B | `Qwen3.5-2B/mtp_weights.bin` | 116 MB |
+| Qwen3.5-4B | `Qwen3.5-4B/mtp_weights.bin` | 230 MB |
+| Qwen3.5-9B | `Qwen3.5-9B/mtp_weights.bin` | 465 MB |
+| Qwen3.5-27B | `Qwen3.5-27B/mtp_weights.bin` | 811 MB |
+## Usage
+Download the file matching your model and place it next to your GGUF file:
+```bash
+# Example: Qwen3.5-9B
+cd /path/to/your/models/
+wget https://huggingface.co/raspbfox/tp906-mtp/resolve/main/Qwen3.5-9B/mtp_weights.bin
+# Your directory should look like:
+#   Qwen3.5-9B-Q8_0.gguf
+#   mtp_weights.bin          <-- tp906 auto-detects this
+# Run with MTP
+tp906-bench -m Qwen3.5-9B-Q8_0.gguf
+```
+tp906 auto-detects `mtp_weights.bin` in the same directory as the GGUF model. No flags needed.
+## Format
+MTP1 binary format (F16 tensors). 15 tensors per file extracted from the official Qwen3.5 safetensors (`model.mtp_block.*` weights). BF16 source tensors are converted to F16 for MI50 compatibility.