majentik/harrier-oss-v1-270m-MLX-4bit

MLX-quantized (4-bit, affine, group size 64) conversion of Microsoft's microsoft/harrier-oss-v1-270m — a state-of-the-art multilingual text embedding model (released March 2026) built on a Gemma3 decoder-only backbone with last-token pooling and L2 normalization.

This checkpoint is optimized for Apple Silicon via MLX and the mlx-embeddings runtime.

Model summary


Base model	`microsoft/harrier-oss-v1-270m`
Backbone	Gemma3 (decoder-only, last-token pooling)
Parameters	270M
Embedding dim	640
Max context	32,768 tokens
Quantization	4-bit affine, group size 64
Size on disk	176.9 MB
License	MIT (inherited from base)

The base model reports MTEB v2 = 66.5 (see the upstream model card); quantized variants are expected to be close to this but have not been independently re-evaluated.

Quantization procedure

Converted with mlx_embeddings.convert:

python -m mlx_embeddings.convert \
    --hf-path microsoft/harrier-oss-v1-270m \
    --mlx-path harrier-oss-v1-270m-MLX-4bit \
    -q --q-bits 4 --q-group-size 64

Harrier-OSS-v1 uses a plain [Transformer, Pooling, Normalize] SentenceTransformer pipeline (no Dense projection head in modules.json), but the upstream mlx_embeddings.models.gemma3_text.Model class unconditionally expects dense.0.weight / dense.1.weight parameters. A small install-local patch was applied to gemma3_text.py so that the dense heads are treated as optional: if the source checkpoint has no dense.* weights, self.dense is replaced with [] during sanitize(), and the forward pass's for dense in self.dense: ... loop becomes a no-op. The patched file will be contributed upstream.

Quickstart

from mlx_embeddings import load, generate
import mlx.core as mx

model, tokenizer = load("majentik/harrier-oss-v1-270m-MLX-4bit")

output = generate(model, tokenizer, texts=[
    "How much protein should a female eat?",
    "Definition of summit",
])
embeddings = output.text_embeds   # L2-normalized
similarity = mx.matmul(embeddings, embeddings.T)
print(similarity)

The base model was trained with instruction-style prompts for retrieval (web_search_query), STS (sts_query), and bitext mining (bitext_query). See config_sentence_transformers.json in this repo for the exact prefixes.

Languages

Multilingual (100+ languages). See the upstream model card for the full list.

Citation

@misc{harrier-oss-v1,
  title  = {Harrier-OSS-v1: multilingual text embeddings},
  author = {Microsoft},
  year   = {2026},
  url    = {https://huggingface.co/microsoft/harrier-oss-v1-270m}
}

Downloads last month: 18

Safetensors

Model size

41.9M params

Tensor type

F16

U32

MLX

Hardware compatibility

Quantized

Model tree for majentik/harrier-oss-v1-270m-MLX-4bit

Base model

microsoft/harrier-oss-v1-270m

Finetuned

(3)

this model

majentik
/

harrier-oss-v1-270m-MLX-4bit