--- pipeline_tag: text-generation language: ceb license: apache-2.0 tags: - trimmed library_name: transformers base_model: Qwen3.5-2B base_model_relation: quantized datasets: - lbourdois/fineweb-2-trimming --- # Qwen3.5-2B-ceb-32768 This model is a **19.95% smaller** version of [Qwen/Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B) optimized for **Cebuano** language via vocabulary size reduction using the [trimming](https://huggingface.co/blog/lbourdois/introduction-to-trimming) method. This trimmed model should perform similarly to the original model with only 32,768 tokens and a much smaller memory footprint. However, it may not perform well for other languages as tokens not commonly used in the selected languages were removed from the vocabulary. ## Model Statistics | Metric | Original | Trimmed | Reduction | |--------|----------|---------|-----------| | **Vocabulary size** | 248,320 tokens | 32,768 tokens | **86.80%** | | **Model size** | 2,213,241,664 params | 1,771,791,168 params | **19.95%** | ![image](https://raw.githubusercontent.com/lbourdois/blog/refs/heads/master/assets/images/Trimming/qwen.5-2B-32768.png) ## Mining Dataset Statistics - **Number of texts used for mining**: 173,644 texts - **Dataset**: [lbourdois/fineweb-2-trimming](https://huggingface.co/datasets/lbourdois/fineweb-2-trimming) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "alphaedge-ai/Qwen.5-2B-ceb-32768" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto", trust_remote_code=True ) # prepare the model input prompt = "Your prompt in Cebuano." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32768 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):] content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content) ``` ## Citations #### Qwen3 ``` @misc{qwen3.5, title = {Qwen3.5: Towards Native Multimodal Agents}, author = {Qwen Team}, month = {February}, year = {2026}, url = {https://qwen.ai/blog?id=qwen3.5} } ``` #### Trimming blog post ``` @misc{hf_blogpost_trimming, title={Introduction to Trimming}, author={Loïck BOURDOIS and Tom AARSEN and Bram VANROY and Christopher AKIKI and Woojun JUNG and Manuel ROMERO and Prithiv SAKTHI}, year={2026}, url={https://huggingface.co/blog/lbourdois/introduction-to-trimming}, } ```