Instructions to use mixedbread-ai/mxbai-embed-xsmall-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("mixedbread-ai/mxbai-embed-xsmall-v1") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - llama-cpp-python
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mixedbread-ai/mxbai-embed-xsmall-v1", filename="gguf/mxbai-embed-xsmall-v1-bf16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16 # Run inference directly in the terminal: llama-cli -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16 # Run inference directly in the terminal: llama-cli -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16 # Run inference directly in the terminal: ./llama-cli -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16
Use Docker
docker model run hf.co/mixedbread-ai/mxbai-embed-xsmall-v1:BF16
- LM Studio
- Jan
- Ollama
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with Ollama:
ollama run hf.co/mixedbread-ai/mxbai-embed-xsmall-v1:BF16
- Unsloth Studio new
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mixedbread-ai/mxbai-embed-xsmall-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mixedbread-ai/mxbai-embed-xsmall-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mixedbread-ai/mxbai-embed-xsmall-v1 to start chatting
- Docker Model Runner
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with Docker Model Runner:
docker model run hf.co/mixedbread-ai/mxbai-embed-xsmall-v1:BF16
- Lemonade
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mixedbread-ai/mxbai-embed-xsmall-v1:BF16
Run and chat with the model
lemonade run user.mxbai-embed-xsmall-v1-BF16
List all available models
lemonade list
The crispy sentence embedding family from Mixedbread.
🍞 Looking for a simple end-to-end retrieval solution? Meet Omni, our multimodal and multilingual model. Get in touch for access.
mixedbread-ai/mxbai-embed-xsmall-v1
This model is an open-source English embedding model developed by Mixedbread. It's built upon sentence-transformers/all-MiniLM-L6-v2 and trained with the AnglE loss and Espresso. Read more details in our blog post.
In a bread loaf:
- State-of-the-art performance
- Supports both binary quantization and Matryoshka Representation Learning (MRL).
- Optimized for retrieval tasks
- 4096 context support
Performance
Binary Quantization and Matryoshka
Our model supports both binary quantization and Matryoshka Representation Learning (MRL), allowing for significant efficiency gains:
- Binary quantization: Retains 93.9% of performance while increasing efficiency by a factor of 32
- MRL: A 33% reduction in vector size still leaves 96.2% of model performance
These optimizations can lead to substantial reductions in infrastructure costs for cloud computing and vector databases. Read more here.
Quickstart
Here are several ways to produce German sentence embeddings using our model.
angle-emb
pip install -U angle-emb
from angle_emb import AnglE
from angle_emb.utils import cosine_similarity
# 1. Specify preferred dimensions
dimensions = 384
# 2. Load model and set pooling strategy to avg
model = AnglE.from_pretrained(
"mixedbread-ai/mxbai-embed-xsmall-v1",
pooling_strategy='avg').cuda()
query = 'A man is eating a piece of bread'
docs = [
query,
"A man is eating food.",
"A man is eating pasta.",
"The girl is carrying a baby.",
"A man is riding a horse.",
]
# 3. Encode
embeddings = model.encode(docs, embedding_size=dimensions)
for doc, emb in zip(docs[1:], embeddings[1:]):
print(f'{query} ||| {doc}', cosine_similarity(embeddings[0], emb))
Sentence Transformers
python -m pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
# 1. Specify preferred dimensions
dimensions = 384
# 2. Load model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-xsmall-v1", truncate_dim=dimensions)
query = 'A man is eating a piece of bread'
docs = [
query,
"A man is eating food.",
"A man is eating pasta.",
"The girl is carrying a baby.",
"A man is riding a horse.",
]
# 3. Encode
embeddings = model.encode(docs)
similarities = cos_sim(embeddings[0], embeddings[1:])
print('similarities:', similarities)
transformers
pip install -U transformers
from typing import Dict
import torch
import numpy as np
from transformers import AutoModel, AutoTokenizer
from sentence_transformers.util import cos_sim
def pooling(outputs: torch.Tensor, inputs: Dict) -> np.ndarray:
outputs = torch.sum(
outputs * inputs["attention_mask"][:, :, None], dim=1) / torch.sum(inputs["attention_mask"])
return outputs.detach().cpu().numpy()
# 1. Load model
model_id = 'mixedbread-ai/mxbai-embed-xsmall-v1'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id).cuda()
query = 'A man is eating a piece of bread'
docs = [
query,
"A man is eating food.",
"A man is eating pasta.",
"The girl is carrying a baby.",
"A man is riding a horse.",
]
# 2. Encode
inputs = tokenizer(docs, padding=True, return_tensors='pt')
for k, v in inputs.items():
inputs[k] = v.cuda()
outputs = model(**inputs).last_hidden_state
embeddings = pooling(outputs, inputs)
# 3. Compute similarity scores
similarities = cos_sim(embeddings[0], embeddings[1:])
print('similarities:', similarities)
Batched API
python -m pip install batched
import uvicorn
import batched
from fastapi import FastAPI
from fastapi.responses import ORJSONResponse
from sentence_transformers import SentenceTransformer
from pydantic import BaseModel
app = FastAPI()
model = SentenceTransformer('mixedbread-ai/mxbai-embed-xsmall-v1')
model.encode = batched.aio.dynamically(model.encode)
class EmbeddingsRequest(BaseModel):
input: str | list[str]
@app.post("/embeddings")
async def embeddings(request: EmbeddingsRequest):
return ORJSONResponse({"embeddings": await model.encode(request.input)})
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Community
Join our discord community to share your feedback and thoughts. We're here to help and always happy to discuss the exciting field of machine learning!
License
Apache 2.0
Citation
@online{xsmall2024mxbai,
title={Every Byte Matters: Introducing mxbai-embed-xsmall-v1},
author={Sean Lee and Julius Lipp and Rui Huang and Darius Koenig},
year={2024},
url={https://www.mixedbread.ai/blog/mxbai-embed-xsmall-v1},
}
- Downloads last month
- 28,349
Model tree for mixedbread-ai/mxbai-embed-xsmall-v1
Unable to build the model tree, the base model loops to the model itself. Learn more.
Spaces using mixedbread-ai/mxbai-embed-xsmall-v1 15
Collection including mixedbread-ai/mxbai-embed-xsmall-v1
Papers for mixedbread-ai/mxbai-embed-xsmall-v1
2D Matryoshka Sentence Embeddings
AnglE-optimized Text Embeddings
Evaluation results
- ndcg_at_1 on MTEB ArguAnatest set self-reported25.180
- ndcg_at_3 on MTEB ArguAnatest set self-reported39.220
- ndcg_at_5 on MTEB ArguAnatest set self-reported43.930
- ndcg_at_10 on MTEB ArguAnatest set self-reported49.580
- ndcg_at_30 on MTEB ArguAnatest set self-reported53.410
- ndcg_at_100 on MTEB ArguAnatest set self-reported54.110
- map_at_1 on MTEB ArguAnatest set self-reported25.180
- map_at_3 on MTEB ArguAnatest set self-reported35.660