gemma-4-E4B-it-OBLITERATED — MLX 4-bit

4-bit MLX quantization of OBLITERATUS/gemma-4-E4B-it-OBLITERATED. Fills the gap between existing fp16 and 8-bit MLX versions.

Base google/gemma-4-E4B-it (4.5B)
Source OBLITERATUS/gemma-4-E4B-it-OBLITERATED — 0% hard refusal (vs 98.8% stock)
Quant 4-bit MLX | 3.9 GB disk | ~4.3 GB RAM
Speed 73.8 tok/s generation on M4 Pro
License Apache 2.0

Quick Start

pip install mlx-lm

# CLI
mlx_lm generate --model zaydiscold/gemma-4-E4B-it-OBLITERATED-MLX-4bit \
    --prompt "Your prompt here"
from mlx_lm import load, generate
model, tokenizer = load("zaydiscold/gemma-4-E4B-it-OBLITERATED-MLX-4bit")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=512)

Recommended: temperature=0.7, top_p=0.9, top_k=40, repeat_penalty=1.1

OBLITERATUS

OBLITERATUS by @elder_plinius removes refusal behavior via whitened SVD, attention head surgery (21/42 layers), winsorized activations, and SVD projection on refusal subspaces. Full methodology on the source model card.

Credits

OBLITERATUS / @elder_plinius -- abliteration | Google DeepMind -- Gemma 4 | @zaydiscold -- MLX 4-bit | Apple MLX

Research/red-teaming only. You are responsible for all generated content. Full terms.

Downloads last month
418
Safetensors
Model size
1B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zaydiscold/gemma-4-E4B-it-OBLITERATED-MLX-4bit

Quantized
(26)
this model