raspbfox commited on
Commit
33f863f
·
verified ·
1 Parent(s): 9335e42

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - qwen3.5
5
+ - mamba2
6
+ - mtp
7
+ - speculative-decoding
8
+ - tp906
9
+ ---
10
+
11
+ # Qwen3.5 MTP Weights for tp906
12
+
13
+ Pre-extracted **Multi-Token Prediction (MTP)** sidecar weights for [Qwen3.5](https://huggingface.co/Qwen) Mamba2-hybrid models.
14
+
15
+ Used by [tp906-engine](https://hub.docker.com/r/skyne98/tp906-engine) for speculative decoding (~10-15% decode speedup).
16
+
17
+ ## Files
18
+
19
+ | Model | File | Size |
20
+ |-------|------|------|
21
+ | Qwen3.5-0.8B | `Qwen3.5-0.8B/mtp_weights.bin` | 39 MB |
22
+ | Qwen3.5-2B | `Qwen3.5-2B/mtp_weights.bin` | 116 MB |
23
+ | Qwen3.5-4B | `Qwen3.5-4B/mtp_weights.bin` | 230 MB |
24
+ | Qwen3.5-9B | `Qwen3.5-9B/mtp_weights.bin` | 465 MB |
25
+ | Qwen3.5-27B | `Qwen3.5-27B/mtp_weights.bin` | 811 MB |
26
+
27
+ ## Usage
28
+
29
+ Download the file matching your model and place it next to your GGUF file:
30
+
31
+ ```bash
32
+ # Example: Qwen3.5-9B
33
+ cd /path/to/your/models/
34
+ wget https://huggingface.co/raspbfox/tp906-mtp/resolve/main/Qwen3.5-9B/mtp_weights.bin
35
+
36
+ # Your directory should look like:
37
+ # Qwen3.5-9B-Q8_0.gguf
38
+ # mtp_weights.bin <-- tp906 auto-detects this
39
+
40
+ # Run with MTP
41
+ tp906-bench -m Qwen3.5-9B-Q8_0.gguf
42
+ ```
43
+
44
+ tp906 auto-detects `mtp_weights.bin` in the same directory as the GGUF model. No flags needed.
45
+
46
+ ## Format
47
+
48
+ MTP1 binary format (F16 tensors). 15 tensors per file extracted from the official Qwen3.5 safetensors (`model.mtp_block.*` weights). BF16 source tensors are converted to F16 for MI50 compatibility.