majentik/harrier-oss-v1-270m-MLX-4bit

MLX-quantized (4-bit, affine, group size 64) conversion of Microsoft's microsoft/harrier-oss-v1-270m — a state-of-the-art multilingual text embedding model (released March 2026) built on a Gemma3 decoder-only backbone with last-token pooling and L2 normalization.

This checkpoint is optimized for Apple Silicon via MLX and the mlx-embeddings runtime.

Model summary

Base model microsoft/harrier-oss-v1-270m
Backbone Gemma3 (decoder-only, last-token pooling)
Parameters 270M
Embedding dim 640
Max context 32,768 tokens
Quantization 4-bit affine, group size 64
Size on disk 176.9 MB
License MIT (inherited from base)

The base model reports MTEB v2 = 66.5 (see the upstream model card); quantized variants are expected to be close to this but have not been independently re-evaluated.

Quantization procedure

Converted with mlx_embeddings.convert:

python -m mlx_embeddings.convert \
    --hf-path microsoft/harrier-oss-v1-270m \
    --mlx-path harrier-oss-v1-270m-MLX-4bit \
    -q --q-bits 4 --q-group-size 64

Harrier-OSS-v1 uses a plain [Transformer, Pooling, Normalize] SentenceTransformer pipeline (no Dense projection head in modules.json), but the upstream mlx_embeddings.models.gemma3_text.Model class unconditionally expects dense.0.weight / dense.1.weight parameters. A small install-local patch was applied to gemma3_text.py so that the dense heads are treated as optional: if the source checkpoint has no dense.* weights, self.dense is replaced with [] during sanitize(), and the forward pass's for dense in self.dense: ... loop becomes a no-op. The patched file will be contributed upstream.

Quickstart

from mlx_embeddings import load, generate
import mlx.core as mx

model, tokenizer = load("majentik/harrier-oss-v1-270m-MLX-4bit")

output = generate(model, tokenizer, texts=[
    "How much protein should a female eat?",
    "Definition of summit",
])
embeddings = output.text_embeds   # L2-normalized
similarity = mx.matmul(embeddings, embeddings.T)
print(similarity)

The base model was trained with instruction-style prompts for retrieval (web_search_query), STS (sts_query), and bitext mining (bitext_query). See config_sentence_transformers.json in this repo for the exact prefixes.

Languages

Multilingual (100+ languages). See the upstream model card for the full list.

See also

Citation

@misc{harrier-oss-v1,
  title  = {Harrier-OSS-v1: multilingual text embeddings},
  author = {Microsoft},
  year   = {2026},
  url    = {https://huggingface.co/microsoft/harrier-oss-v1-270m}
}
Downloads last month
18
Safetensors
Model size
41.9M params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/harrier-oss-v1-270m-MLX-4bit

Finetuned
(3)
this model