Instructions to use mixedbread-ai/mxbai-embed-xsmall-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mixedbread-ai/mxbai-embed-xsmall-v1 with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("mixedbread-ai/mxbai-embed-xsmall-v1")

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

llama-cpp-python

How to use mixedbread-ai/mxbai-embed-xsmall-v1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="mixedbread-ai/mxbai-embed-xsmall-v1",
	filename="gguf/mxbai-embed-xsmall-v1-bf16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use mixedbread-ai/mxbai-embed-xsmall-v1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16
# Run inference directly in the terminal:
llama-cli -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16
# Run inference directly in the terminal:
llama-cli -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16
# Run inference directly in the terminal:
./llama-cli -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf mixedbread-ai/mxbai-embed-xsmall-v1:BF16

Use Docker

docker model run hf.co/mixedbread-ai/mxbai-embed-xsmall-v1:BF16

LM Studio
Jan
Ollama
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with Ollama:
```
ollama run hf.co/mixedbread-ai/mxbai-embed-xsmall-v1:BF16
```

Unsloth Studio new

How to use mixedbread-ai/mxbai-embed-xsmall-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mixedbread-ai/mxbai-embed-xsmall-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mixedbread-ai/mxbai-embed-xsmall-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for mixedbread-ai/mxbai-embed-xsmall-v1 to start chatting

Docker Model Runner
How to use mixedbread-ai/mxbai-embed-xsmall-v1 with Docker Model Runner:
```
docker model run hf.co/mixedbread-ai/mxbai-embed-xsmall-v1:BF16
```

Lemonade

How to use mixedbread-ai/mxbai-embed-xsmall-v1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull mixedbread-ai/mxbai-embed-xsmall-v1:BF16

Run and chat with the model

lemonade run user.mxbai-embed-xsmall-v1-BF16

List all available models

lemonade list

The crispy sentence embedding family from Mixedbread.

^{🍞 Looking for a simple end-to-end retrieval solution? Meet Omni, our multimodal and multilingual model. Get in touch for access.}

mixedbread-ai/mxbai-embed-xsmall-v1

This model is an open-source English embedding model developed by Mixedbread. It's built upon sentence-transformers/all-MiniLM-L6-v2 and trained with the AnglE loss and Espresso. Read more details in our blog post.

In a bread loaf:

State-of-the-art performance

Supports both binary quantization and Matryoshka Representation Learning (MRL).

Optimized for retrieval tasks

4096 context support

Performance

Binary Quantization and Matryoshka

Our model supports both binary quantization and Matryoshka Representation Learning (MRL), allowing for significant efficiency gains:

Binary quantization: Retains 93.9% of performance while increasing efficiency by a factor of 32

MRL: A 33% reduction in vector size still leaves 96.2% of model performance

These optimizations can lead to substantial reductions in infrastructure costs for cloud computing and vector databases. Read more here.

Quickstart

Here are several ways to produce German sentence embeddings using our model.

angle-emb

pip install -U angle-emb

from angle_emb import AnglE from angle_emb.utils import cosine_similarity # 1. Specify preferred dimensions dimensions = 384 # 2. Load model and set pooling strategy to avg model = AnglE.from_pretrained( "mixedbread-ai/mxbai-embed-xsmall-v1", pooling_strategy='avg').cuda() query = 'A man is eating a piece of bread' docs = [ query, "A man is eating food.", "A man is eating pasta.", "The girl is carrying a baby.", "A man is riding a horse.", ] # 3. Encode embeddings = model.encode(docs, embedding_size=dimensions) for doc, emb in zip(docs[1:], embeddings[1:]): print(f'{query} ||| {doc}', cosine_similarity(embeddings[0], emb))

Sentence Transformers

python -m pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer from sentence_transformers.util import cos_sim # 1. Specify preferred dimensions dimensions = 384 # 2. Load model model = SentenceTransformer("mixedbread-ai/mxbai-embed-xsmall-v1", truncate_dim=dimensions) query = 'A man is eating a piece of bread' docs = [ query, "A man is eating food.", "A man is eating pasta.", "The girl is carrying a baby.", "A man is riding a horse.", ] # 3. Encode embeddings = model.encode(docs) similarities = cos_sim(embeddings[0], embeddings[1:]) print('similarities:', similarities)

transformers

pip install -U transformers

from typing import Dict import torch import numpy as np from transformers import AutoModel, AutoTokenizer from sentence_transformers.util import cos_sim def pooling(outputs: torch.Tensor, inputs: Dict) -> np.ndarray: outputs = torch.sum( outputs * inputs["attention_mask"][:, :, None], dim=1) / torch.sum(inputs["attention_mask"]) return outputs.detach().cpu().numpy() # 1. Load model model_id = 'mixedbread-ai/mxbai-embed-xsmall-v1' tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModel.from_pretrained(model_id).cuda() query = 'A man is eating a piece of bread' docs = [ query, "A man is eating food.", "A man is eating pasta.", "The girl is carrying a baby.", "A man is riding a horse.", ] # 2. Encode inputs = tokenizer(docs, padding=True, return_tensors='pt') for k, v in inputs.items(): inputs[k] = v.cuda() outputs = model(**inputs).last_hidden_state embeddings = pooling(outputs, inputs) # 3. Compute similarity scores similarities = cos_sim(embeddings[0], embeddings[1:]) print('similarities:', similarities)

Batched API

python -m pip install batched

import uvicorn import batched from fastapi import FastAPI from fastapi.responses import ORJSONResponse from sentence_transformers import SentenceTransformer from pydantic import BaseModel app = FastAPI() model = SentenceTransformer('mixedbread-ai/mxbai-embed-xsmall-v1') model.encode = batched.aio.dynamically(model.encode) class EmbeddingsRequest(BaseModel): input: str | list[str] @app.post("/embeddings") async def embeddings(request: EmbeddingsRequest): return ORJSONResponse({"embeddings": await model.encode(request.input)}) if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000)

Community

Join our discord community to share your feedback and thoughts. We're here to help and always happy to discuss the exciting field of machine learning!

License

Apache 2.0

Citation

@online{xsmall2024mxbai, title={Every Byte Matters: Introducing mxbai-embed-xsmall-v1}, author={Sean Lee and Julius Lipp and Rui Huang and Darius Koenig}, year={2024}, url={https://www.mixedbread.ai/blog/mxbai-embed-xsmall-v1}, }