Instructions to use embedme/lightonai-colbert-zero-f32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use embedme/lightonai-colbert-zero-f32 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="embedme/lightonai-colbert-zero-f32", filename="lightonai-colbert-zero-f32.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use embedme/lightonai-colbert-zero-f32 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf embedme/lightonai-colbert-zero-f32:F32 # Run inference directly in the terminal: llama-cli -hf embedme/lightonai-colbert-zero-f32:F32
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf embedme/lightonai-colbert-zero-f32:F32 # Run inference directly in the terminal: llama-cli -hf embedme/lightonai-colbert-zero-f32:F32
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf embedme/lightonai-colbert-zero-f32:F32 # Run inference directly in the terminal: ./llama-cli -hf embedme/lightonai-colbert-zero-f32:F32
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf embedme/lightonai-colbert-zero-f32:F32 # Run inference directly in the terminal: ./build/bin/llama-cli -hf embedme/lightonai-colbert-zero-f32:F32
Use Docker
docker model run hf.co/embedme/lightonai-colbert-zero-f32:F32
- LM Studio
- Jan
- Ollama
How to use embedme/lightonai-colbert-zero-f32 with Ollama:
ollama run hf.co/embedme/lightonai-colbert-zero-f32:F32
- Unsloth Studio new
How to use embedme/lightonai-colbert-zero-f32 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for embedme/lightonai-colbert-zero-f32 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for embedme/lightonai-colbert-zero-f32 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for embedme/lightonai-colbert-zero-f32 to start chatting
- Docker Model Runner
How to use embedme/lightonai-colbert-zero-f32 with Docker Model Runner:
docker model run hf.co/embedme/lightonai-colbert-zero-f32:F32
- Lemonade
How to use embedme/lightonai-colbert-zero-f32 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull embedme/lightonai-colbert-zero-f32:F32
Run and chat with the model
lemonade run user.lightonai-colbert-zero-f32-F32
List all available models
lemonade list
ColBERT-Zero (GGUF f32 + Projection)
Quantized GGUF conversion of lightonai/ColBERT-Zero for use with litembeddings.
ColBERT-Zero is SOTA on BEIR (55.43 nDCG@10) for models under 150M parameters, outperforming all other ColBERT and dense retrieval models trained on public data.
Model Details
| Property | Value |
|---|---|
| Base Model | lightonai/ColBERT-Zero |
| Architecture | ModernBERT-base (~100M params) |
| Output Dimensions | 128 (after projection) |
| Context Length | 8,192 tokens |
| Quantization | f32 |
| GGUF Size | 571 MB |
| Projection | 768 โ 128 (PyLate Dense layer) |
| License | Apache 2.0 |
| Use Case | General-purpose semantic search with late interaction (ColBERT-style MaxSim) |
Available Variants
| Variant | Size | Embedding Latency (11 tok / 50 tok / 150 tok) | Notes |
|---|---|---|---|
| f32 | 571 MB | 463ms / 770ms / 3062ms | Original precision |
| f16 | 286 MB | 1385ms / 3642ms / 11439ms | Slow without FP16 hardware |
| Q8_0 (recommended) | 153 MB | 97ms / 625ms / 2633ms | Fastest on CPU, 3.7x smaller than f32 |
Benchmarked on QEMU vCPU with SSE4.2. Q8_0 is fastest due to integer SIMD; f16 is slowest without hardware FP16.
BEIR Benchmark (from original model)
| Model | BEIR nDCG@10 | Params | Data |
|---|---|---|---|
| ColBERT-Zero | 55.43 | ~100M | Public only |
| ModernColBERT-embed-base | 55.12 | ~100M | Public only |
| GTE-ModernColBERT | 54.67 | ~100M | Proprietary |
| ModernBERT-embed-supervised (dense) | 52.89 | ~100M | Public only |
MaxSim Score Consistency Across Quants
| Query | f32 | f16 | Q8_0 |
|---|---|---|---|
| Related pair | 9.203 | 9.202 | 9.191 |
| Unrelated pair | 7.643 | 7.642 | 7.626 |
Negligible quality loss from quantization โ Q8_0 scores within 0.1% of f32.
Files
| File | Size | Description |
|---|---|---|
lightonai-colbert-zero-f32.gguf |
571 MB | ModernBERT-base encoder in GGUF f32 format |
lightonai-colbert-zero-f32.projection |
385 KB | Projection matrix (128ร768, float32) |
Usage with litembeddings
.load ./build/litembeddings
-- Load model with projection
SELECT lembed_model('lightonai-colbert-zero-f32.gguf',
'{"colbert_projection": "lightonai-colbert-zero-f32.projection"}');
-- Generate token embeddings
SELECT lembed_tokens('search_query: What is machine learning?');
-- Semantic search with MaxSim scoring
SELECT
id, content,
lembed_maxsim(lembed_tokens('search_query: error handling best practices'), tokens) AS score
FROM documents
ORDER BY score DESC
LIMIT 10;
Important: Query/Document Prefixes
ColBERT-Zero uses asymmetric prompts for best results:
- Queries: Prefix with
search_query: - Documents: Prefix with
search_document:
Omitting these prefixes degrades performance by ~0.8-1.3 nDCG@10 points.
Conversion
python scripts/convert_colbert_to_gguf.py lightonai/ColBERT-Zero ./models \
--name colbert-zero --quantize f32
License: Apache 2.0
- Downloads last month
- 10
32-bit
Model tree for embedme/lightonai-colbert-zero-f32
Base model
lightonai/ColBERT-Zero