Instructions to use johannhartmann/bueble-lm-2b-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use johannhartmann/bueble-lm-2b-sft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="johannhartmann/bueble-lm-2b-sft") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("johannhartmann/bueble-lm-2b-sft") model = AutoModelForCausalLM.from_pretrained("johannhartmann/bueble-lm-2b-sft") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use johannhartmann/bueble-lm-2b-sft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "johannhartmann/bueble-lm-2b-sft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "johannhartmann/bueble-lm-2b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/johannhartmann/bueble-lm-2b-sft
- SGLang
How to use johannhartmann/bueble-lm-2b-sft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "johannhartmann/bueble-lm-2b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "johannhartmann/bueble-lm-2b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "johannhartmann/bueble-lm-2b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "johannhartmann/bueble-lm-2b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use johannhartmann/bueble-lm-2b-sft with Docker Model Runner:
docker model run hf.co/johannhartmann/bueble-lm-2b-sft
BübleLM SFT WIP
BübleLM
A small German LM
BübleLM is a German language model based on Gemma-2-2B, adapted using trans-tokenization with a custom German SentencePiece tokenizer. The model demonstrates how language-specific tokenization can significantly improve performance while maintaining the base model's capabilities.
This is an experimental version that received some finetuning using several german datasets. DPO version will follow soon.
Model Details
- Architecture: Based on Gemma-2B decoder-only architecture
- Parameters: 2 billion
- Tokenizer: Custom German SentencePiece tokenizer (20k vocabulary)
- Fertility rate: 1.78 tokens per word
- Optimized for German morphological structures
- Trained on the same corpus as the model
- Context Length: 8192 tokens
- Training Hardware: Single node with 4x NVidia A100-SXM4-80GB GPUs
Training Data
Trained on 3.5B tokens from Occiglot-FineWeb project, including:
- Contemporary web content (OSCAR 2015-2023)
- Legislative documents (EurLex, ParlamInt)
- News data (Tagesschau)
- Wiki sources
Data sampling weights:
- Wikipedia: 4x
- News/Parliamentary: 2x
- Other sources: 1x
Finetuning
Additional supervised finetuning via lora was done using german translations of alpaca-gpt4, openschnabeltier, evol_instruct, dolphin, airoboros, slimorca, hermes and synthia.
Performance
TBD after dpo training.
Usage
Source
@article{delobelle2024buble,
title={BübleLM: A small German LM},
author={Delobelle, Pieter and Akbik, Alan and others},
year={2024}
}
- Downloads last month
- 7