Gemma 4 26B-A4B IT — Abliterated (V6) — GGUF

GGUF quantizations of wangzhang/gemma-4-26B-A4B-it-abliterix, an abliterated version of google/gemma-4-26B-A4B-it.

2/100 refusals (2%) | KL divergence: 0.0005 | Created with Abliterix

Files

File Size Description
gemma4-26b-a4b-abliterix-F16.gguf ~50.5 GB Full precision FP16
gemma4-26b-a4b-abliterix-Q8_0.gguf ~26 GB 8-bit quantization, near-lossless
gemma4-26b-a4b-abliterix-Q4_K_M.gguf ~16 GB 4-bit K-quant, recommended for most users
mmproj-gemma4-26b-a4b-f16.gguf ~1.2 GB Vision projector (multimodal support)

VRAM Requirements

Quantization Min VRAM
F16 ~50 GB
Q8_0 ~26 GB
Q4_K_M ~16 GB

Usage

llama.cpp

llama-cli -m gemma4-26b-a4b-abliterix-Q4_K_M.gguf \
  --mmproj mmproj-gemma4-26b-a4b-f16.gguf \
  -p "Your prompt here" \
  -n 512

Transformers (safetensors version)

For the full BF16 model via Transformers, use the safetensors repo:

from transformers import AutoModelForImageTextToText, AutoTokenizer
import torch

model = AutoModelForImageTextToText.from_pretrained(
    "wangzhang/gemma-4-26B-A4B-it-abliterix",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="eager",  # required for Gemma-4 MoE on Blackwell/sm_120
)
tokenizer = AutoTokenizer.from_pretrained("wangzhang/gemma-4-26B-A4B-it-abliterix")

messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@software{abliterix,
  author = {Wu, Wangzhang},
  title = {Abliterix: Automated LLM Abliteration},
  year = {2026},
  url = {https://github.com/wuwangzhang1216/abliterix}
}

Source Model

See wangzhang/gemma-4-26B-A4B-it-abliterix for the full model card, methodology, and evaluation details.


Built with Abliterix | PyPI

Downloads last month
3,408
GGUF
Model size
25B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wangzhang/gemma-4-26B-A4B-it-abliterix-GGUF

Quantized
(12)
this model

Collection including wangzhang/gemma-4-26B-A4B-it-abliterix-GGUF