Instructions to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF", filename="Qwen3.6-27B-Omnimerge-v4-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Use Docker
docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
- Ollama
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Ollama:
ollama run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
- Unsloth Studio new
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting
- Pi new
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Docker Model Runner:
docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
- Lemonade
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.6-27B-Omnimerge-v4-GGUF-Q4_K_M
List all available models
lemonade list
Run and chat with the model
lemonade run user.Qwen3.6-27B-Omnimerge-v4-GGUF-List all available models
lemonade listQwen3.6-27B-Omnimerge-v4-GGUF
GGUF quantizations of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 — the MLP-passthrough variant that defends against the Qwen3.6 think-policy fragility we discovered. Source dtype is BF16; this repo provides the standard bartowski quant ladder (F16 → IQ2_XXS) for llama.cpp.
Source model:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4(BF16 weights, model card with full benchmarks and methodology). NOT a quant of clean Qwen/Qwen3.6-27B — these GGUFs contain the v4 merge.MTP companion (2× decode speedup): weight-identical GGUFs with the MTP head retained for
llama.cpp --spec-type draft-mtpself-speculative decoding are atManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF. Quality is statistically indistinguishable from this repo (HE 137/164 ↔ 137/164, GPQA 155/198 ↔ 154/198); aggregate decode is 2.0-2.3 × faster on a single 24 GB GPU. Use that repo for interactive / single-request workloads where latency matters.
All quants made using imatrix with calibration data v5, the same calibration set bartowski uses for the Qwen3.6 base release — so quality fingerprints are directly comparable to bartowski's Qwen_Qwen3.6-27B-GGUF repo.
Why this merge exists
Same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B + 3 Qwen3.6 fine-tunes. Direct successor to ManniX-ITA/Qwen3.5-27B-Omnimerge-v2 on the newer Qwen3.6 base, with mlp.{gate,up,down}_proj copied verbatim from clean Qwen3.6 (the "MLP-passthrough" surgery) to defend against a Qwen3.6-specific reasoning-tag fragility we found during forensic delta inspection. See the v4 model card for the full story, scripts, and benchmark methodology.
Benchmark headline (Q6_K, head-to-head vs Qwen3.6 base + Omnimerge-v2)
All scored under identical llama.cpp + lm_eval conditions (--reasoning-format deepseek --reasoning-budget 8192 --parallel 2, raw /v1/completions, no chat template).
| Benchmark | Qwen3.6 base Q6_K (bartowski) | Omnimerge-v2 (Qwen3.5 base) | Omnimerge-v4-MLP (this) | Δ vs base | Δ vs v2 |
|---|---|---|---|---|---|
| HumanEval pass@1 (164q) | 84.76% | 79.27% | 83.54% (137/164) | −1.22 pp | +4.27 pp |
| MBPP pass@1 (500q) — corrected* | 57.60% | 74.60% | 73.00% (365/500) | +15.40 pp | −1.60 pp |
| GPQA Diamond pass@1 (flex) — full greedy§ | not measured | 69.19% (full 198q) | 78.28% (155/198) | — | +9.09 pp |
* MBPP scores are post-<think>-stripping (lm_eval's raw scorer SyntaxErrors on literal < in exec(prompt+completion+tests)). See the v4 model card for the per-model recovery breakdown.
§ Canonical full-198q greedy GPQA result measured 2026-05-22 on pod 37268930 (Vast.ai 3090) with the patched eval chain (lm-eval 0.4.11 + max_length=32768 override + the api_models.py:545 UnboundLocalError patch + aiohttp lifecycle workaround). Sampler: do_sample=False, temperature=0.0, max_gen_toks=8192. Wall time 4 h 55 min. Companion strict-match (rigid Answer: X template) is 7.58 % — the model emits CoT verbosely rather than the strict template, so flex is the real quality signal. Earlier card revisions reported an ≈ 84.75 % partial result (177/198 sampled at T=0.6, budget=16384); that number is superseded by this canonical greedy measurement on the full bench — the 6.5 pp difference is driven by the methodology change (sampler / budget / completeness), not by a model change.
Available Quantizations
All 27 files (F16 + 26 imatrix-quantized tiers, ~417 GB total) are uploaded and ready. imatrix.dat (used for every quant) is in the repo root for audit and reproduction.
| Quantization | File size | Use case |
|---|---|---|
| F16 (full precision) | 50.11 GB | Conversion source / lossless reference |
| Q8_0 | 26.63 GB | Highest fidelity, large |
| Q6_K_L | 21.14 GB | Q6_K with embed/output at Q8_0 |
| Q6_K | 20.57 GB | Recommended high tier — eval methodology used this |
| Q5_K_L | 18.64 GB | Q5_K_M with embed/output at Q8_0 |
| Q5_K_M | 17.91 GB | Strong fidelity, balanced |
| Q5_K_S | 17.40 GB | Slightly smaller K-mix |
| Q4_K_L | 16.29 GB | Q4_K_M with embed/output at Q8_0 |
| Q4_1 | 15.91 GB | Legacy 4-bit, dense |
| Q4_K_M | 15.41 GB | Recommended balanced tier for most users |
| IQ4_NL | 14.72 GB | Importance-aware 4-bit non-linear |
| Q4_K_S | 14.52 GB | K-mix small variant |
| Q4_0 | 14.41 GB | Legacy 4-bit |
| IQ4_XS | 14.05 GB | IQ4 extra-small |
| Q3_K_XL | 13.42 GB | Q3_K_L with embed/output at Q8_0 |
| Q3_K_L | 13.36 GB | 3-bit K-mix large |
| Q3_K_M | 12.39 GB | 3-bit K-mix medium |
| IQ3_M | 11.72 GB | Importance-aware 3-bit medium |
| Q3_K_S | 11.24 GB | 3-bit K-mix small |
| IQ3_XS | 11.15 GB | IQ3 extra-small |
| Q2_K_L | 11.13 GB | Q2_K with embed/output at Q8_0 |
| IQ3_XXS | 10.42 GB | IQ3 extra-extra-small |
| Q2_K | 9.98 GB | 2-bit K-mix |
| IQ2_M | 9.32 GB | Importance-aware 2-bit medium |
| IQ2_S | 8.72 GB | IQ2 small |
| IQ2_XS | 8.47 GB | IQ2 extra-small |
| IQ2_XXS | 7.85 GB | IQ2 extra-extra-small (smallest) |
How to Use
With llama.cpp:
# Recommended args for reasoning-tag-emitting models (matches the eval methodology):
llama-server \
-m Qwen3.6-27B-Omnimerge-v4-Q4_K_M.gguf \
-c 32768 -ngl 99 -t 12 --no-warmup \
--reasoning-format deepseek --reasoning-budget 8192
Swap Q4_K_M for any tier from the table above. Q6_K matches the methodology used in our published evals; Q4_K_M is the typical "balanced" choice for most users.
For multimodal (vision) inference: the mmproj projector is in bartowski/Qwen_Qwen3.6-27B-GGUF and works with this model unchanged (vision tower is preserved verbatim from the base).
With ollama: use a Modelfile pointing to one of the GGUFs above, or HF direct load.
imatrix.dat
The imatrix.dat (~14 MB) used to generate every quant in this repo is uploaded alongside the GGUFs at the repo root. Reproducible, auditable.
Reproducing
See scripts/ on the source v4 model repo:
dare_ties_merge.py— main merger (auto-detects Qwen3.6 base viaoutput_gate_typeand applies MLP-skip)v4_mlp_passthrough.py— post-process: rebuild merged dir with MLP layers from basequantize_gguf.py— the script that built this repo
For dense (non-Gemma-4-MoE) models, pass --exclude CD-Q6_K,CD-Q5_K_M,CD-Q4_K_M,CD-Q3_K_M,CD-Q2_K to skip ContribDynamic tiers (those require Gemma 4 expert-contribution maps).
License
Apache-2.0 (inherited from Qwen/Qwen3.6-27B and the fine-tune sources).
Acknowledgements
- Qwen team for the Qwen3.6 base
- rico03, ValiantLabs, kai-os for the fine-tunes
- bartowski for the calibration_datav5.txt set used here
- DARE / TIES / DARE-TIES authors and the arcee-ai/mergekit community
- Downloads last month
- 39,785
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF
Base model
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4
Pull the model
# Download Lemonade from https://lemonade-server.ai/lemonade pull ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF: