--- base_model: Qwen/Qwen3-8B language: - en license: apache-2.0 pipeline_tag: text-generation library_name: transformers arxiv: 2509.22944 tags: - quantized - sinq - efficient-inference - qwen - llm - compression base_model_relation: quantized ---

Logo

๐Ÿ™ Github   |   ๐Ÿ“„ Paper

# PreSINQ GGUF Quantized Qwen3-4B Model This repository contains the official PreSINQ **GGUF-quantized** versions of the [`Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B) model. For a detailed explanation of PreSINQ strategy please refer to the the official [SINQ](https://github.com/huawei-csl/SINQ) repository. SINQ is a fast and high-quality quantization technique designed to significantly reduce Large Language Model size while preserving accuracy. If you find this project useful, **please consider giving a โญ to the official [SINQ](https://github.com/huawei-csl/SINQ) repository**. --- ## Model Details - **Model Name:** `Qwen3-8B-PreSINQ-GGUF` - **Base Model:** [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B) - **Task:** Text Generation - **Framework:** PyTorch / Transformers - **License:** [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) - **Quantized By:** *Huawei โ€“ Computing Systems Lab* --- # How to Obtain the PreSINQ Model The PreSINQ Qwen3-8B models are produced using the **PreSINQ GGUF script** available in the official [SINQ](https://github.com/huawei-csl/SINQ) repository. The models provided here correspond to the best-performing configurations for each quantization type. ## ๐Ÿ“Š Best PreSINQ Quantization Results (Qwen3-8B) Results below are measured on the **WikiText-2 test set**. | Method | Bits | Size (GB) | Perplexity โ†“ | |----------|--------|------------|----------------| | Baseline (FP16) | FP16 | 15.26 | 10.1019 | | Baseline + Q3_K_S | 3-bit | 3.77 | 11.3619 | | **PreSINQ + Q3_K_S** | 3-bit | 3.77 | **10.6786** | However, you can generate good PreSINQ models (not the best one) faster by reducing the number of configurations explored during the PreSINQ script execution. --- # ๐Ÿš€ Usage ## Usage Example You can load and run the PreSINQ GGUF models using: - ๐Ÿค— Transformers - llama.cpp - Any GGUF-compatible inference framework --- # ๐Ÿงพ How to Cite This Work If you find **SINQ** useful in your research or applications: - Please give a โญ to the official [SINQ](https://github.com/huawei-csl/SINQ) repository - Cite our paper: ```bibtex @misc{muller2025sinq, title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights}, author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli}, year={2025}, eprint={2509.22944}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={http://arxiv.org/abs/2509.22944} }