DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145
How to use philschmid/qwen-2.5-3b-r1-countdown with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="philschmid/qwen-2.5-3b-r1-countdown")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("philschmid/qwen-2.5-3b-r1-countdown")
model = AutoModelForCausalLM.from_pretrained("philschmid/qwen-2.5-3b-r1-countdown")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use philschmid/qwen-2.5-3b-r1-countdown with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "philschmid/qwen-2.5-3b-r1-countdown"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "philschmid/qwen-2.5-3b-r1-countdown",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/philschmid/qwen-2.5-3b-r1-countdown
How to use philschmid/qwen-2.5-3b-r1-countdown with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "philschmid/qwen-2.5-3b-r1-countdown" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "philschmid/qwen-2.5-3b-r1-countdown",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "philschmid/qwen-2.5-3b-r1-countdown" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "philschmid/qwen-2.5-3b-r1-countdown",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use philschmid/qwen-2.5-3b-r1-countdown with Docker Model Runner:
docker model run hf.co/philschmid/qwen-2.5-3b-r1-countdown
qwen-2.5-3b-r1-countdown a mini R1 experiments
This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct. It has been trained using TRL and GRPO on the Countdown game.
If you want to learn how to replicate this model and reproduce your own Deepseek R1 "aha" moment, check out my blog post.
from vllm import LLM, SamplingParams
from datasets import load_dataset
from random import randint
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=512)
# use revision without "checkpoints-" as vLLM downloads all of them
llm = LLM(model="philschmid/qwen-2.5-3b-r1-countdown", revision="099c0f8cbfc522e7c3a476edfb749f576b164539")
# Load dataset from Hugging Face Hub
dataset_id = "Jiayi-Pan/Countdown-Tasks-3to4"
dataset = load_dataset(dataset_id, split="train")
sample = dataset[randint(0, len(dataset))]
# create conversation
messages = [
{"role": "system", "content": "You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer."},
{"role": "user", "content": f"Using the numbers {sample['nums']}, create an equation that equals {sample['target']}. You can use basic arithmetic operations (+, -, *, /) one or multiple times but each number can only be used once. Show your work in <think> </think> tags. And return the final equation in <answer> </answer> tags, for example <answer> (1 + 2) / 3 </answer>. Think step by step inside <think> tags."},
{"role": "assistant", "content": "Let me solve this step by step.\n<think>"}
]
# generate response
res = llm.generate(llm.get_tokenizer().apply_chat_template(messages, tokenize=False, continue_final_message=True), sampling_params)
res = "<think>" + res[0].outputs[0].text
print(res)
# <think> We need to use the numbers 37, 15, 4, and 13 with basic arithmetic operations to make 16. Let's try different combinations:
# - 37 - 15 - 4 - 13 = 6 (too low)
# - 37 - 15 + 4 - 13 = 13 (too low)
# - 37 + 15 - 4 - 13 = 35 (too high)
# - 37 - 15 + 4 + 13 = 39 (too high)
# - 15 + 4 + 13 - 37 = -1 (too low)
# - 37 + 15 + 4 - 13 = 43 (too high)
# - 15 + 4 * 13 / 37 = 15 + 52 / 37 (not an integer)
# - 15 * 4 / 37 - 37 = -28.24 (not a whole number)
# - 4 * 13 / 15 - 37 = 41.3333 (not a whole number)
# After all combinations, I got not any integer result as 16.
# </think>
# <answer> 37 - 15 + 4 + 13 </answer>
This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.