Harrier-OSS-v1-0.6B MLX 8-bit

MLX 8-bit quantization of microsoft/harrier-oss-v1-0.6b, produced with mlx-embeddings on Apple Silicon.

What is this?

Harrier-OSS-v1 is Microsoft's state-of-the-art multilingual text embedding model family (Mar 2026). This 0.6B variant uses a Qwen3 backbone with sentence-transformers dense projection heads. Top-tier MMTEB multilingual performance (~74.3 average) under a fully-permissive MIT license.

Quantization

  • Method: MLX affine quantization (mlx_embeddings.convert), group_size=64
  • Bits per weight: 8
  • Output size: 618M

Quickstart

from mlx_embeddings import load

model, tokenizer = load("majentik/harrier-oss-v1-0.6b-MLX-8bit")
inputs = tokenizer(["query: what is Harrier-OSS?", "passage: Harrier-OSS is a text embedding model..."],
                   padding=True, truncation=True, return_tensors="mlx")
outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
embeddings = outputs.text_embeds  # L2-normalised
print((embeddings[0] @ embeddings[1:].T).tolist())

Specifications

Property Value
Base Model microsoft/harrier-oss-v1-0.6b
Backbone Qwen3Model + sentence-transformers Dense heads
Parameters 0.6B (pre-quantization)
Context Length 32K
License MIT

License

MIT — inherited from the upstream Harrier-OSS-v1 model.

See also

Downloads last month
60
Safetensors
Model size
0.2B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/harrier-oss-v1-0.6b-MLX-8bit

Finetuned
(5)
this model