tiny-mimo-v2-flash

A ~2.34B-parameter tiny random-weight checkpoint of XiaomiMiMo/MiMo-V2-Flash, used for internal testing in Hugging Face transformers for the native HF implementation.

Configuration

Hyperparameter Value Original MiMo
num_hidden_layers 5 48
layer_types [full, sliding×4] matches pattern
mlp_layer_types [dense, sparse×4] matches pattern (layer 0 dense, rest MoE)
hidden_size 2048 4096 (ratio 2.0)
intermediate_size 8192 16384 (ratio 2.0)
moe_intermediate_size 1024 2048 (ratio 2.0)
num_attention_heads / num_key_value_heads 16 / 1 64 / 4 (ratio 4.0)
head_dim / v_head_dim 192 / 128 192 / 128
n_routed_experts / num_experts_per_tok 64 / 2 256 / 8 (ratio 4.0)
parameters 2.34B 300B
Downloads last month
129
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support