Huihui-Qwen3.6-27B-abliterated-AWQ

AWQ W4A16 quantized version of huihui-ai/Huihui-Qwen3.6-27B-abliterated.

This repository is marked as a quantized derivative of the Huihui model via:

base_model:
- huihui-ai/Huihui-Qwen3.6-27B-abliterated
base_model_relation: quantized

Quantization

The model uses native AutoAWQ-style AWQ INT4 weights with FP16 activations:

{
  "quant_method": "awq",
  "bits": 4,
  "group_size": 128,
  "version": "gemm",
  "zero_point": true
}

Additional modules intentionally left unquantized are recorded in config.json under quantization_config.modules_to_not_convert.

Tested Runtime

Validated locally with a modified 1Cat-vLLM build on 4 x Tesla V100-SXM2-32GB:

python -m vllm.entrypoints.openai.api_server \
  --model alexxorm/Huihui-Qwen3.6-27B-abliterated-AWQ \
  --quantization awq \
  --dtype float16 \
  --tensor-parallel-size 4 \
  --kv-cache-dtype fp8_e5m2

The tested local server used SM70 AWQ kernels, FLASH_ATTN_V100, and FP8 KV cache. For contexts above the model config limit, vLLM requires VLLM_ALLOW_LONG_MAX_MODEL_LEN=1; use that override only after validating quality/stability for your workload.

Notes

This model inherits the safety/usage characteristics of the upstream abliterated model. The upstream authors describe it as an uncensored/abliterated variant of Qwen3.6-27B and warn that safety filtering is reduced. Review outputs before using in production or public-facing systems.

Base Model

Downloads last month
674
Safetensors
Model size
28B params
Tensor type
BF16
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alexxorm/Huihui-Qwen3.6-27B-abliterated-AWQ

Base model

Qwen/Qwen3.6-27B
Quantized
(24)
this model