Instructions to use johannhartmann/bueble-lm-2b-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use johannhartmann/bueble-lm-2b-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="johannhartmann/bueble-lm-2b-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("johannhartmann/bueble-lm-2b-sft")
model = AutoModelForCausalLM.from_pretrained("johannhartmann/bueble-lm-2b-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use johannhartmann/bueble-lm-2b-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "johannhartmann/bueble-lm-2b-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "johannhartmann/bueble-lm-2b-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/johannhartmann/bueble-lm-2b-sft

SGLang

How to use johannhartmann/bueble-lm-2b-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "johannhartmann/bueble-lm-2b-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "johannhartmann/bueble-lm-2b-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "johannhartmann/bueble-lm-2b-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "johannhartmann/bueble-lm-2b-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use johannhartmann/bueble-lm-2b-sft with Docker Model Runner:
```
docker model run hf.co/johannhartmann/bueble-lm-2b-sft
```

BübleLM SFT WIP

BübleLM

A small German LM

BübleLM is a German language model based on Gemma-2-2B, adapted using trans-tokenization with a custom German SentencePiece tokenizer. The model demonstrates how language-specific tokenization can significantly improve performance while maintaining the base model's capabilities.

This is an experimental version that received some finetuning using several german datasets. DPO version will follow soon.

Model Details

Architecture: Based on Gemma-2B decoder-only architecture
Parameters: 2 billion
Tokenizer: Custom German SentencePiece tokenizer (20k vocabulary)
- Fertility rate: 1.78 tokens per word
- Optimized for German morphological structures
- Trained on the same corpus as the model
Context Length: 8192 tokens
Training Hardware: Single node with 4x NVidia A100-SXM4-80GB GPUs

Training Data

Trained on 3.5B tokens from Occiglot-FineWeb project, including:

Contemporary web content (OSCAR 2015-2023)
Legislative documents (EurLex, ParlamInt)
News data (Tagesschau)
Wiki sources

Data sampling weights:

Wikipedia: 4x
News/Parliamentary: 2x
Other sources: 1x

Finetuning

Additional supervised finetuning via lora was done using german translations of alpaca-gpt4, openschnabeltier, evol_instruct, dolphin, airoboros, slimorca, hermes and synthia.

Performance

TBD after dpo training.

Usage

Source

@article{delobelle2024buble,
    title={BübleLM: A small German LM},
    author={Delobelle, Pieter and Akbik, Alan and others},
    year={2024}
}

Downloads last month: 7

Safetensors

Model size

2B params

Tensor type

BF16