tiny-mimo-v2-flash
A ~2.34B-parameter tiny random-weight checkpoint of XiaomiMiMo/MiMo-V2-Flash, used for internal testing in Hugging Face transformers for the native HF implementation.
Configuration
| Hyperparameter | Value | Original MiMo |
|---|---|---|
num_hidden_layers |
5 | 48 |
layer_types |
[full, sliding×4] |
matches pattern |
mlp_layer_types |
[dense, sparse×4] |
matches pattern (layer 0 dense, rest MoE) |
hidden_size |
2048 | 4096 (ratio 2.0) |
intermediate_size |
8192 | 16384 (ratio 2.0) |
moe_intermediate_size |
1024 | 2048 (ratio 2.0) |
num_attention_heads / num_key_value_heads |
16 / 1 | 64 / 4 (ratio 4.0) |
head_dim / v_head_dim |
192 / 128 | 192 / 128 |
n_routed_experts / num_experts_per_tok |
64 / 2 | 256 / 8 (ratio 4.0) |
| parameters | 2.34B | 300B |
- Downloads last month
- 129
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support