Harrier-OSS-v1-0.6B MLX 8-bit

MLX 8-bit quantization of microsoft/harrier-oss-v1-0.6b, produced with mlx-embeddings on Apple Silicon.

What is this?

Harrier-OSS-v1 is Microsoft's state-of-the-art multilingual text embedding model family (Mar 2026). This 0.6B variant uses a Qwen3 backbone with sentence-transformers dense projection heads. Top-tier MMTEB multilingual performance (~74.3 average) under a fully-permissive MIT license.

Quantization

Method: MLX affine quantization (mlx_embeddings.convert), group_size=64
Bits per weight: 8
Output size: 618M

Quickstart

from mlx_embeddings import load

model, tokenizer = load("majentik/harrier-oss-v1-0.6b-MLX-8bit")
inputs = tokenizer(["query: what is Harrier-OSS?", "passage: Harrier-OSS is a text embedding model..."],
                   padding=True, truncation=True, return_tensors="mlx")
outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
embeddings = outputs.text_embeds  # L2-normalised
print((embeddings[0] @ embeddings[1:].T).tolist())

Specifications

Property	Value
Base Model	microsoft/harrier-oss-v1-0.6b
Backbone	Qwen3Model + sentence-transformers Dense heads
Parameters	0.6B (pre-quantization)
Context Length	32K
License	MIT

License

MIT — inherited from the upstream Harrier-OSS-v1 model.

Model tree for majentik/harrier-oss-v1-0.6b-MLX-8bit

Base model

microsoft/harrier-oss-v1-0.6b

Finetuned

(5)

this model

majentik
/

harrier-oss-v1-0.6b-MLX-8bit