Instructions to use INX-TEXT/Bailong-instruct-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use INX-TEXT/Bailong-instruct-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="INX-TEXT/Bailong-instruct-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("INX-TEXT/Bailong-instruct-7B")
model = AutoModelForCausalLM.from_pretrained("INX-TEXT/Bailong-instruct-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use INX-TEXT/Bailong-instruct-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "INX-TEXT/Bailong-instruct-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "INX-TEXT/Bailong-instruct-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/INX-TEXT/Bailong-instruct-7B

SGLang

How to use INX-TEXT/Bailong-instruct-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "INX-TEXT/Bailong-instruct-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "INX-TEXT/Bailong-instruct-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "INX-TEXT/Bailong-instruct-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "INX-TEXT/Bailong-instruct-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use INX-TEXT/Bailong-instruct-7B with Docker Model Runner:
```
docker model run hf.co/INX-TEXT/Bailong-instruct-7B
```

關於tokenizer效率的圖表

by Splend1dchan - opened Feb 19, 2024

Discussion

Splend1dchan

Feb 19, 2024

您好，

想請教一下貴團隊tokenizer的圖表，應該如何理解？

就我的了解，對於同一個分布的語料，壓縮率應該是均質的，也就是斜率會趨近一條直線。
但即便如此，橫軸會是#chars，縱軸會是#tokens，也與貴團隊所展示的不一樣。

我想了解是不是我對這張圖表有什麼誤解。

Jeff

Blaze7451

INX-TEXT-AI org Feb 19, 2024

嗨 @Splend1dchan ，

我們使用了erhwenkuo/dolly-15k-chinese-zhtw資料集，用模型的tokenizer對每一筆資料集裡的data做斷詞。舉例，資料集中的第一筆資料
data = """維珍澳大利亞航空（Virgin Australia Airlines Pty Ltd 的商業名稱）是一家總部位於澳大利亞的航空公司。它是機隊規模最大的使用維珍品牌的航空公司。它於 2000 年 8 月 31 日以 Virgin Blue 名義開始運營，在一條航線上有兩架飛機。 2001 年 9 月安捷澳大利亞公司倒閉後，它突然發現自己成爲澳大利亞國內市場的一家主要航空公司。此後，該航空公司已發展到以布里斯班、墨爾本和悉尼爲樞紐，直接爲澳大利亞 32 個城市提供服務。維珍澳大利亞航空什麼時候開始運營？維珍澳大利亞航空於 2000 年 8 月 31 日以 Virgin Blue 名義開始運營，在一條航線上有兩架飛機。"""
context_length = len(data) = 311

用三個模型的tokenizer對此筆資料做斷詞後，三個模型所使用的token數分別為：Bailong - 146, Breeze - 211, Taiwan-LLM - 403。以此類推，統計每筆資料context length與token numbers後繪成該圖。

確實在未加說明的情形下，單獨看該圖會造成困惑。對於該疏漏，在此表示誠摯的歉意。感謝您的提問，我們將著手處理此問題。

Splend1dchan

Feb 19, 2024

您好，

感謝您的迅速回復。我有兩點建議供您參考：

因context length容易與模型做聯想(Context length refers to the maximum number of tokens the model can remember when generating text.)，但這個單位是token，並非你們所想要表達的意思，因此，我會建議橫軸替換成number of characters。
2.另外，若以字元(char)計算，中文和英文characters的位階不同，因此壓縮率差異很大，而維基百科畢竟還是混有不少英文，會使趨勢線波動比較大。我認為長度最長的那個點可能是一個英文偏多的outlier。你們也可以考慮橫軸用number of bytes來計算，比較能消弭英文文章中英文量所造成的影響。這樣的轉換應該不影響到趨勢。

Jeff

Blaze7451

INX-TEXT-AI org Feb 19, 2024

好的，感謝您的寶貴建議。

Blaze7451 changed discussion status to closed Feb 23, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment