majentik/harrier-oss-v1-270m-MLX-4bit
MLX-quantized (4-bit, affine, group size 64) conversion of Microsoft's microsoft/harrier-oss-v1-270m — a state-of-the-art multilingual text embedding model (released March 2026) built on a Gemma3 decoder-only backbone with last-token pooling and L2 normalization.
This checkpoint is optimized for Apple Silicon via MLX
and the mlx-embeddings runtime.
Model summary
| Base model | microsoft/harrier-oss-v1-270m |
| Backbone | Gemma3 (decoder-only, last-token pooling) |
| Parameters | 270M |
| Embedding dim | 640 |
| Max context | 32,768 tokens |
| Quantization | 4-bit affine, group size 64 |
| Size on disk | 176.9 MB |
| License | MIT (inherited from base) |
The base model reports MTEB v2 = 66.5 (see the upstream model card); quantized variants are expected to be close to this but have not been independently re-evaluated.
Quantization procedure
Converted with mlx_embeddings.convert:
python -m mlx_embeddings.convert \
--hf-path microsoft/harrier-oss-v1-270m \
--mlx-path harrier-oss-v1-270m-MLX-4bit \
-q --q-bits 4 --q-group-size 64
Harrier-OSS-v1 uses a plain [Transformer, Pooling, Normalize]
SentenceTransformer pipeline (no Dense projection head in modules.json), but
the upstream mlx_embeddings.models.gemma3_text.Model class unconditionally
expects dense.0.weight / dense.1.weight parameters. A small install-local
patch was applied to gemma3_text.py so that the dense heads are treated as
optional: if the source checkpoint has no dense.* weights, self.dense is
replaced with [] during sanitize(), and the forward pass's
for dense in self.dense: ... loop becomes a no-op. The patched file will be
contributed upstream.
Quickstart
from mlx_embeddings import load, generate
import mlx.core as mx
model, tokenizer = load("majentik/harrier-oss-v1-270m-MLX-4bit")
output = generate(model, tokenizer, texts=[
"How much protein should a female eat?",
"Definition of summit",
])
embeddings = output.text_embeds # L2-normalized
similarity = mx.matmul(embeddings, embeddings.T)
print(similarity)
The base model was trained with instruction-style prompts for retrieval
(web_search_query), STS (sts_query), and bitext mining (bitext_query). See
config_sentence_transformers.json in this repo for the exact prefixes.
Languages
Multilingual (100+ languages). See the upstream model card for the full list.
See also
- Base model:
microsoft/harrier-oss-v1-270m - MLX embeddings runtime:
mlx-embeddings - Curated index:
majentik/garden
Citation
@misc{harrier-oss-v1,
title = {Harrier-OSS-v1: multilingual text embeddings},
author = {Microsoft},
year = {2026},
url = {https://huggingface.co/microsoft/harrier-oss-v1-270m}
}
- Downloads last month
- 18
Quantized
Model tree for majentik/harrier-oss-v1-270m-MLX-4bit
Base model
microsoft/harrier-oss-v1-270m