Dario Schiraldi : How do I load a pre-trained model from Hugging Face?

Hi everyone,

I’m Dario Schiraldi, CEO of Travel Works, currently working on a project and I want to use a pre-trained model from Hugging Face to save time on training. I’m having a bit of trouble figuring out the best way to load a pre-trained model into my Python code.

Could anyone provide some guidance or suggestions on how I can do this properly?

Regards
Dario Schiraldi CEO of Travel Works

There are some minor differences in syntax depending on the library used to handle the model, but in general, loading is complete with from_pretrained(repo_id).

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-135M-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is gravity?"}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))