tmnam20/ViMedAQA
Viewer • Updated • 88.6k • 99 • 12
How to use danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen3-0.6b-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA")How to use danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA",
max_seq_length=2048,
)This model is a fine-tuned version of unsloth/qwen3-0.6b-unsloth-bnb-4bit. It has been trained using TRL.
This model was trained with SFT.
import os
from huggingface_hub import login
# Set the Hugging Face API token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>"
# # Initialize API
login(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TextStreamer
from peft import PeftModel
device = "cuda" if torch.cuda.is_available() else "cpu"
# Define model and LoRA adapter paths
base_model_name = "Qwen/Qwen3-0.6B"
lora_adapter_name = "danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Load base model with optimized settings
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16, # Use FP16 for efficiency
device_map=device,
trust_remote_code=True
)
# Apply LoRA adapter
model = PeftModel.from_pretrained(model, lora_adapter_name)
# Set model to evaluation mode
model.eval()
prompt = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào "
"tại bệnh viện để thăm khám?")
# Set random seed for reproducibility
seed = 42
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
messages = [
{"role" : "user", "content" : prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize = False,
add_generation_prompt = True, # Must add for generation
enable_thinking = False, # Disable thinking
)
_ = model.generate(
**tokenizer(text, return_tensors = "pt").to(device),
max_new_tokens = 2048, # Increase for longer outputs!
temperature = 0.7, top_p = 0.9, top_k = 20, # For non thinking
streamer = TextStreamer(tokenizer, skip_prompt = True, skip_special_tokens=True),
)
Khi nghi ngờ bị loét dạ dày tá tràng, bạn nên đến phòng khám chuyên khoa Giai đoạn Trung tâm Nghi ngờ Loét Dạ dày để được tư vấn và đánh giá chẩn đoán chính xác.
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}