---
base_model: Qwen/Qwen3-8B
language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
arxiv: 2509.22944
tags:
- quantized
- sinq
- efficient-inference
- qwen
- llm
- compression
base_model_relation: quantized
---
๐ Github | ๐ Paper
# PreSINQ GGUF Quantized Qwen3-4B Model
This repository contains the official PreSINQ **GGUF-quantized** versions of the [`Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B) model. For a detailed explanation of PreSINQ strategy please refer to the the official [SINQ](https://github.com/huawei-csl/SINQ) repository.
SINQ is a fast and high-quality quantization technique designed to significantly reduce Large Language Model size while preserving accuracy.
If you find this project useful, **please consider giving a โญ to the official [SINQ](https://github.com/huawei-csl/SINQ) repository**.
---
## Model Details
- **Model Name:** `Qwen3-8B-PreSINQ-GGUF`
- **Base Model:** [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B)
- **Task:** Text Generation
- **Framework:** PyTorch / Transformers
- **License:** [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
- **Quantized By:** *Huawei โ Computing Systems Lab*
---
# How to Obtain the PreSINQ Model
The PreSINQ Qwen3-8B models are produced using the **PreSINQ GGUF script** available in the official [SINQ](https://github.com/huawei-csl/SINQ) repository.
The models provided here correspond to the best-performing configurations for each quantization type.
## ๐ Best PreSINQ Quantization Results (Qwen3-8B)
Results below are measured on the **WikiText-2 test set**.
| Method | Bits | Size (GB) | Perplexity โ |
|----------|--------|------------|----------------|
| Baseline (FP16) | FP16 | 15.26 | 10.1019 |
| Baseline + Q3_K_S | 3-bit | 3.77 | 11.3619 |
| **PreSINQ + Q3_K_S** | 3-bit | 3.77 | **10.6786** |
However, you can generate good PreSINQ models (not the best one) faster by reducing the number of configurations explored during the PreSINQ script execution.
---
# ๐ Usage
## Usage Example
You can load and run the PreSINQ GGUF models using:
- ๐ค Transformers
- llama.cpp
- Any GGUF-compatible inference framework
---
# ๐งพ How to Cite This Work
If you find **SINQ** useful in your research or applications:
- Please give a โญ to the official [SINQ](https://github.com/huawei-csl/SINQ) repository
- Cite our paper:
```bibtex
@misc{muller2025sinq,
title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights},
author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
year={2025},
eprint={2509.22944},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={http://arxiv.org/abs/2509.22944}
}