majentik/harrier-oss-v1-27b-MLX-4bit

MLX-quantized (4-bit, affine, group size 64) conversion of Microsoft's microsoft/harrier-oss-v1-27b โ€” a state-of-the-art multilingual text embedding model (released March 2026) built on a Gemma3 decoder-only backbone with last-token pooling and L2 normalization.

This checkpoint is optimized for Apple Silicon via MLX and the mlx-embeddings runtime.

Model summary

Base model microsoft/harrier-oss-v1-27b
Backbone Gemma3 (decoder-only, last-token pooling)
Parameters 27B
Embedding dim 5376
Max context 32,768 tokens
Quantization 4-bit affine, group size 64
Size on disk 14.18 GB
License MIT (inherited from base)

The base model reports MTEB v2 = 74.3 (see the upstream model card); quantized variants are expected to be close to this but have not been independently re-evaluated.

Quantization procedure

Converted with mlx_embeddings.convert:

python -m mlx_embeddings.convert \
    --hf-path microsoft/harrier-oss-v1-27b \
    --mlx-path harrier-oss-v1-27b-MLX-4bit \
    -q --q-bits 4 --q-group-size 64

Harrier-OSS-v1 uses a plain [Transformer, Pooling, Normalize] SentenceTransformer pipeline (no Dense projection head in modules.json), but the upstream mlx_embeddings.models.gemma3_text.Model class unconditionally expects dense.0.weight / dense.1.weight parameters. A small install-local patch was applied to gemma3_text.py so that the dense heads are treated as optional: if the source checkpoint has no dense.* weights, self.dense is replaced with [] during sanitize(), and the forward pass's for dense in self.dense: ... loop becomes a no-op. The patched file will be contributed upstream.

Quickstart

from mlx_embeddings import load, generate
import mlx.core as mx

model, tokenizer = load("majentik/harrier-oss-v1-27b-MLX-4bit")

output = generate(model, tokenizer, texts=[
    "How much protein should a female eat?",
    "Definition of summit",
])
embeddings = output.text_embeds   # L2-normalized
similarity = mx.matmul(embeddings, embeddings.T)
print(similarity)

The base model was trained with instruction-style prompts for retrieval (web_search_query), STS (sts_query), and bitext mining (bitext_query). See config_sentence_transformers.json in this repo for the exact prefixes.

Languages

Multilingual (100+ languages). See the upstream model card for the full list.

See also

Citation

@misc{harrier-oss-v1,
  title  = {Harrier-OSS-v1: multilingual text embeddings},
  author = {Microsoft},
  year   = {2026},
  url    = {https://huggingface.co/microsoft/harrier-oss-v1-27b}
}
Downloads last month
22
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for majentik/harrier-oss-v1-27b-MLX-4bit

Finetuned
(3)
this model