Instructions to use Bohanlu/Taigi-Llama-2-Translator-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Bohanlu/Taigi-Llama-2-Translator-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Bohanlu/Taigi-Llama-2-Translator-7B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Bohanlu/Taigi-Llama-2-Translator-7B")
model = AutoModelForCausalLM.from_pretrained("Bohanlu/Taigi-Llama-2-Translator-7B")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Bohanlu/Taigi-Llama-2-Translator-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Bohanlu/Taigi-Llama-2-Translator-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bohanlu/Taigi-Llama-2-Translator-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Bohanlu/Taigi-Llama-2-Translator-7B

SGLang

How to use Bohanlu/Taigi-Llama-2-Translator-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Bohanlu/Taigi-Llama-2-Translator-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bohanlu/Taigi-Llama-2-Translator-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Bohanlu/Taigi-Llama-2-Translator-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bohanlu/Taigi-Llama-2-Translator-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Bohanlu/Taigi-Llama-2-Translator-7B with Docker Model Runner:
```
docker model run hf.co/Bohanlu/Taigi-Llama-2-Translator-7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for Taigi-Llama-2-Translator-7B

The Taigi-Llama-2-Translator series are built based on the Taigi-Llama-2 series model. We conducted fine-tuning on 263k parallel data to create a translation model for Taiwanese Hokkien and related languages.

For more details, please refer to our GitHub repository and the paper: Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems

Explore other models and datasets in the Taiwanese Hokkien LLM collection.

Model description

Base Model: Bohanlu/Taigi-Llama-2-7B
Usage: This model can be used for translating between Traditional Chinese or English and Taiwanese Hokkien (Hanzi, POJ, or Hanlo). It also supports translation between different scripts of Taiwanese Hokkien (Hanzi, POJ, Hanlo).
Language(s) (NLP): Taiwanese Hokkien (Hanzi, POJ and Hanlo), Traditional Chinese and English
Input: Text in source language
Output: Text in target language
Model Size: 7B parameters

Prompt Template

{BOS}[TRANS]\n{source_sentence}\n[/TRANS]\n[{target_language}]\n

source_sentence: The sentence you want to translate.
target_language: The target language you want to translate to. Use "ZH" for Traditional Chinese, "EN" for English, "POJ" for Taiwanese Hokkien POJ, "HL" for Taiwanese Hokkien Hanlo, and "HAN" for Taiwanese Hokkien Hanzi.
Ensure there's a newline at the end.

Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer, TextGenerationPipeline
import torch
import accelerate

def get_pipeline(path:str, tokenizer:AutoTokenizer, accelerator:accelerate.Accelerator) -> TextGenerationPipeline:
    model = AutoModelForCausalLM.from_pretrained(
        path, torch_dtype=torch.float16, device_map='auto', trust_remote_code=True)
    
    terminators = [tokenizer.eos_token_id, tokenizer.pad_token_id]

    pipeline = TextGenerationPipeline(model = model, tokenizer = tokenizer, num_workers=accelerator.state.num_processes*4, pad_token_id=tokenizer.pad_token_id, eos_token_id=terminators)

    return pipeline

model_dir = "Bohanlu/Taigi-Llama-2-Translator-7B" # or "Bohanlu/Taigi-Llama-2-Translator-13B" for the 13B model
tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=False)

accelerator = accelerate.Accelerator()
pipe = get_pipeline(model_dir, tokenizer, accelerator)

PROMPT_TEMPLATE = "[TRANS]\n{source_sentence}\n[/TRANS]\n[{target_language}]\n"

def translate(source_sentence:str, target_language:str) -> str:
    prompt = PROMPT_TEMPLATE.format(source_sentence=source_sentence, target_language=target_language)
    out = pipe(prompt, return_full_text=False, repetition_penalty=1.1, do_sample=False)[0]['generated_text']
    return out[:out.find("[/")].strip()

source_sentence = "How are you today？"

print("To Hanzi: " + translate(source_sentence, "HAN"))
# Output: To Hanzi: 你今仔日好無？

print("To POJ: " + translate(source_sentence, "POJ"))
# Output: To POJ: Lí kin-á-ji̍t án-chóaⁿ?

print("To Traditional Chinese: " + translate(source_sentence, "ZH"))
# Output: To Traditional Chinese: 你今天好嗎？

print("To Hanlo: " + translate(source_sentence, "HL"))
# Output: To Hanlo: 你今仔日好無？

Citation

If you find the resources in the Taiwanese Hokkien LLM collection useful in your work, please cite it using the following reference:

@misc{lu2024enhancing,
      title={Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems}, 
      author={Bo-Han Lu and Yi-Hsuan Lin and En-Shiun Annie Lee and Richard Tzong-Han Tsai},
      year={2024},
      eprint={2403.12024},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Downloads last month: 166

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for Bohanlu/Taigi-Llama-2-Translator-7B

Quantizations

1 model

Collection including Bohanlu/Taigi-Llama-2-Translator-7B

Taiwanese Hokkien LLM

Collection

The collection of Taiwanese Hokkien (Taigi) large language models and related resources. • 10 items • Updated Mar 2 • 3

Paper for Bohanlu/Taigi-Llama-2-Translator-7B

Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems

Paper • 2403.12024 • Published Mar 18, 2024