Instructions to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF",
	filename="Qwen3.6-27B-Omnimerge-v4-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Use Docker

docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Ollama
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Ollama:
```
ollama run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
```

Unsloth Studio new

How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting

Pi new

How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Docker Model Runner:
```
docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
```

Lemonade

How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.6-27B-Omnimerge-v4-GGUF-Q4_K_M

List all available models

lemonade list

Qwen3.6-27B-Omnimerge-v4-GGUF

GGUF quantizations of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 — the MLP-passthrough variant that defends against the Qwen3.6 think-policy fragility we discovered. Source dtype is BF16; this repo provides the standard bartowski quant ladder (F16 → IQ2_XXS) for llama.cpp.

Source model: ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 (BF16 weights, model card with full benchmarks and methodology). NOT a quant of clean Qwen/Qwen3.6-27B — these GGUFs contain the v4 merge.

MTP companion (2× decode speedup): weight-identical GGUFs with the MTP head retained for llama.cpp --spec-type draft-mtp self-speculative decoding are at ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF. Quality is statistically indistinguishable from this repo (HE 137/164 ↔ 137/164, GPQA 155/198 ↔ 154/198); aggregate decode is 2.0-2.3 × faster on a single 24 GB GPU. Use that repo for interactive / single-request workloads where latency matters.

All quants made using imatrix with calibration data v5, the same calibration set bartowski uses for the Qwen3.6 base release — so quality fingerprints are directly comparable to bartowski's Qwen_Qwen3.6-27B-GGUF repo.

Why this merge exists

Same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B + 3 Qwen3.6 fine-tunes. Direct successor to ManniX-ITA/Qwen3.5-27B-Omnimerge-v2 on the newer Qwen3.6 base, with mlp.{gate,up,down}_proj copied verbatim from clean Qwen3.6 (the "MLP-passthrough" surgery) to defend against a Qwen3.6-specific reasoning-tag fragility we found during forensic delta inspection. See the v4 model card for the full story, scripts, and benchmark methodology.

Benchmark headline (Q6_K, head-to-head vs Qwen3.6 base + Omnimerge-v2)

All scored under identical llama.cpp + lm_eval conditions (--reasoning-format deepseek --reasoning-budget 8192 --parallel 2, raw /v1/completions, no chat template).

Benchmark	Qwen3.6 base Q6_K (bartowski)	Omnimerge-v2 (Qwen3.5 base)	Omnimerge-v4-MLP (this)	Δ vs base	Δ vs v2
HumanEval pass@1 (164q)	84.76%	79.27%	83.54% (137/164)	−1.22 pp	+4.27 pp
MBPP pass@1 (500q) — corrected*	57.60%	74.60%	73.00% (365/500)	+15.40 pp	−1.60 pp
GPQA Diamond pass@1 (flex) — full greedy§	not measured	69.19% (full 198q)	78.28% (155/198)	—	+9.09 pp

* MBPP scores are post-<think>-stripping (lm_eval's raw scorer SyntaxErrors on literal < in exec(prompt+completion+tests)). See the v4 model card for the per-model recovery breakdown.

§ Canonical full-198q greedy GPQA result measured 2026-05-22 on pod 37268930 (Vast.ai 3090) with the patched eval chain (lm-eval 0.4.11 + max_length=32768 override + the api_models.py:545 UnboundLocalError patch + aiohttp lifecycle workaround). Sampler: do_sample=False, temperature=0.0, max_gen_toks=8192. Wall time 4 h 55 min. Companion strict-match (rigid Answer: X template) is 7.58 % — the model emits CoT verbosely rather than the strict template, so flex is the real quality signal. Earlier card revisions reported an ≈ 84.75 % partial result (177/198 sampled at T=0.6, budget=16384); that number is superseded by this canonical greedy measurement on the full bench — the 6.5 pp difference is driven by the methodology change (sampler / budget / completeness), not by a model change.

Available Quantizations

All 27 files (F16 + 26 imatrix-quantized tiers, ~417 GB total) are uploaded and ready. imatrix.dat (used for every quant) is in the repo root for audit and reproduction.

Quantization	File size	Use case
F16 (full precision)	50.11 GB	Conversion source / lossless reference
Q8_0	26.63 GB	Highest fidelity, large
Q6_K_L	21.14 GB	Q6_K with embed/output at Q8_0
Q6_K	20.57 GB	Recommended high tier — eval methodology used this
Q5_K_L	18.64 GB	Q5_K_M with embed/output at Q8_0
Q5_K_M	17.91 GB	Strong fidelity, balanced
Q5_K_S	17.40 GB	Slightly smaller K-mix
Q4_K_L	16.29 GB	Q4_K_M with embed/output at Q8_0
Q4_1	15.91 GB	Legacy 4-bit, dense
Q4_K_M	15.41 GB	Recommended balanced tier for most users
IQ4_NL	14.72 GB	Importance-aware 4-bit non-linear
Q4_K_S	14.52 GB	K-mix small variant
Q4_0	14.41 GB	Legacy 4-bit
IQ4_XS	14.05 GB	IQ4 extra-small
Q3_K_XL	13.42 GB	Q3_K_L with embed/output at Q8_0
Q3_K_L	13.36 GB	3-bit K-mix large
Q3_K_M	12.39 GB	3-bit K-mix medium
IQ3_M	11.72 GB	Importance-aware 3-bit medium
Q3_K_S	11.24 GB	3-bit K-mix small
IQ3_XS	11.15 GB	IQ3 extra-small
Q2_K_L	11.13 GB	Q2_K with embed/output at Q8_0
IQ3_XXS	10.42 GB	IQ3 extra-extra-small
Q2_K	9.98 GB	2-bit K-mix
IQ2_M	9.32 GB	Importance-aware 2-bit medium
IQ2_S	8.72 GB	IQ2 small
IQ2_XS	8.47 GB	IQ2 extra-small
IQ2_XXS	7.85 GB	IQ2 extra-extra-small (smallest)

How to Use

With llama.cpp:

# Recommended args for reasoning-tag-emitting models (matches the eval methodology):
llama-server \
    -m Qwen3.6-27B-Omnimerge-v4-Q4_K_M.gguf \
    -c 32768 -ngl 99 -t 12 --no-warmup \
    --reasoning-format deepseek --reasoning-budget 8192

Swap Q4_K_M for any tier from the table above. Q6_K matches the methodology used in our published evals; Q4_K_M is the typical "balanced" choice for most users.

For multimodal (vision) inference: the mmproj projector is in bartowski/Qwen_Qwen3.6-27B-GGUF and works with this model unchanged (vision tower is preserved verbatim from the base).

With ollama: use a Modelfile pointing to one of the GGUFs above, or HF direct load.

imatrix.dat

The imatrix.dat (~14 MB) used to generate every quant in this repo is uploaded alongside the GGUFs at the repo root. Reproducible, auditable.

Reproducing

See scripts/ on the source v4 model repo:

dare_ties_merge.py — main merger (auto-detects Qwen3.6 base via output_gate_type and applies MLP-skip)
v4_mlp_passthrough.py — post-process: rebuild merged dir with MLP layers from base
quantize_gguf.py — the script that built this repo

For dense (non-Gemma-4-MoE) models, pass --exclude CD-Q6_K,CD-Q5_K_M,CD-Q4_K_M,CD-Q3_K_M,CD-Q2_K to skip ContribDynamic tiers (those require Gemma 4 expert-contribution maps).

License

Apache-2.0 (inherited from Qwen/Qwen3.6-27B and the fine-tune sources).

Acknowledgements

Qwen team for the Qwen3.6 base
rico03, ValiantLabs, kai-os for the fine-tunes
bartowski for the calibration_datav5.txt set used here
DARE / TIES / DARE-TIES authors and the arcee-ai/mergekit community

Downloads last month: 39,785

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF

Base model

ManniX-ITA/Qwen3.6-27B-Omnimerge-v4

Quantized

(5)

this model