Failed to load model

🥲 加载模型失败

Failed to load model.

Error when loading model: ValueError: Expected shape (8192, 672) but received shape (8192, 1344) for parameter language_model.model.layers.0.self_attn.q_proj.weight
1 Like

The mlx model I’m using keeps failing to load on my Mac. Why won’t it work?

1 Like

If this is a Gemma 4 family device, it could be explained by the fact that Gemma 4 was released only recently and software support isn’t yet fully established; however, if that’s not the case, the issue is a different matter.

In any case, the simplest solution is to update the software (if that doesn’t resolve the issue, you’ll need to look for another solution or wait):


For the Gemma 4 JANG MLX model, the most likely reason it will not load is this:

the app/runtime that is reading the model does not agree with the model file format about how one of the weight tensors is stored. In other words, this is most likely a compatibility problem, not a “your Mac is too weak” problem. The strongest clue is the exact error itself, and there is already a public LM Studio bug report showing the same shape mismatch on the same parameter while loading gemma-4-31b-jang_4m-crack on Mac with LM Studio MLX. (GitHub)

What the error is saying

This part matters most:

Expected shape (8192, 672) but received shape (8192, 1344)

A model is a collection of tensors. Each tensor has a fixed shape. When the loader reaches language_model.model.layers.0.self_attn.q_proj.weight, it expects one layout, but the checkpoint contains another. Because the mismatch is exact and structural, the loader stops immediately. This kind of failure happens before inference even begins, which is why it points to a format/layout mismatch rather than a normal runtime slowdown or memory pressure issue. (GitHub)

The fact that 1344 is exactly 672 is also a strong clue. That pattern usually suggests the tensor is being interpreted with the wrong packing/layout assumption rather than being randomly corrupted. I cannot prove the exact internal packing rule from the error alone, but the clean 2× difference strongly suggests “loader and checkpoint disagree on representation,” not “file is slightly damaged.” The recent mlx-lm release notes make that interpretation more plausible because they include a Gemma 4–specific fix for quantized per-layer projection loading, and your error is on a projection tensor (q_proj.weight). (GitHub)

Why your case is especially prone to this

Your model is not just a plain stock MLX conversion. The model card identifies it as JANG v2 (MLX-native safetensors), with actual average 5.1 bits, a dense hybrid sliding/global attention Gemma 4 architecture, and a recommendation to use vMLX for the best experience. The same page presents vMLX as the recommended path and says standard mlx_lm / mlx_vlm do not support this setup at the versions it lists. (Hugging Face)

A related conversion page for the GGUF version explains the compatibility point even more directly: the original model uses JANG v2 mixed-precision MLX quantization and says standard tools such as LM Studio, llama.cpp, oMLX, and mlx-lm cannot load that original format because of its mixed per-layer bit widths. That lines up very well with your shape-mismatch error. (Hugging Face)

So the likely story is:

  1. the runtime knows enough to recognize the model as Gemma 4,
  2. it starts building the attention layer,
  3. then it reaches a quantized projection tensor whose stored layout matches JANG/vMLX expectations,
  4. but the loader is assuming a different MLX layout,
  5. and load fails with the shape mismatch. (GitHub)

Why this is probably not a memory problem

If this were mostly a RAM or unified-memory issue, the usual symptoms would be different: out-of-memory messages, Metal allocation failures, crashing later in loading, or trouble once generation starts. Your failure happens earlier and more cleanly: the loader names a specific tensor and says its shape is wrong. That is a schema/format mismatch type of error. Also, a recent user writeup shows that at least some official Gemma 4 MLX models can run in LM Studio on a 32 GB Mac after runtime updates, which further suggests that the main blocker here is not simply Mac memory. (GitHub)

Why Gemma 4 makes this easier to trip over

Gemma 4 is a newer and more specialized model family than older plain text-only local models. Google describes Gemma 4 as a four-size family built for reasoning and agentic workflows, and Gemma 4 support only landed recently in the MLX ecosystem. The mlx-lm release notes show that Gemma 4 support was added recently and immediately followed by Gemma 4–specific fixes, including the projection-loading fix mentioned earlier. That is the pattern you see when support is still stabilizing: some models load, some do not, and custom formats are more likely to break first. (blog.google)

LM Studio’s timeline shows the same thing. On April 3, 2026, users were still hitting Model type gemma4 not supported with No module named 'mlx_vlm.models.gemma4'. On April 13, 2026, there was still a public issue showing Gemma 4 support is not ready yet. LM Studio’s changelog then shows Gemma 4-related updates on April 2, April 9, and April 10—but those entries are about tool-call reliability and the updated Gemma 4 chat template, not a fix for this exact JANG tensor-layout mismatch. (GitHub)

That context matters because it means your experience is not strange. It fits a broader pattern: Gemma 4 support on Mac/MLX was moving quickly, and custom JANG MLX checkpoints sit near the edge of compatibility. (GitHub)

What I think is happening in your case

My best explanation is this:

The model file is probably okay, but your current loader path is not the right one for this checkpoint format.

More specifically, I think you are trying to load a JANG v2 mixed-precision MLX checkpoint in a runtime path that can handle some Gemma 4 MLX models, but not this particular weight layout. That interpretation is strongly supported by:

  • the exact same public bug report for the same family of model, (GitHub)
  • the model card’s vMLX-first guidance, (Hugging Face)
  • the GGUF conversion page explicitly saying the original JANG format is not for standard tools, (Hugging Face)
  • and the recent Gemma 4 projection-loading fixes in mlx-lm. (GitHub)

What it is less likely to be

It is less likely that:

  • your Mac is simply too weak, because the error is structural rather than resource-related, (GitHub)
  • the model is randomly corrupted, because the mismatch is clean and reproducible rather than chaotic, and the same error exists publicly on another machine, (GitHub)
  • or your prompt/settings are wrong, because those matter after loading, not during tensor-shape validation. (GitHub)

A mixed or stale local snapshot is still possible, though. If your local cache combines config.json, shard files, or index files from different revisions, that can also create “expected A, got B” errors. It is not my top guess here, but it is worth cleaning up because it is easy to test. (GitHub)

The safest way to fix it

1. If you are in LM Studio, update both the app and the runtimes

In LM Studio, check Settings → Runtime and update LM Studio MLX and Metal llama.cpp. The changelog shows Gemma 4-related updates in early April, and a recent user writeup says updating the runtime was what allowed an official MLX Gemma 4 model to run in LM Studio. (LM Studio)

This is worth doing even though it may not fully solve the JANG model, because older LM Studio builds had explicit Gemma 4 support gaps. (GitHub)

2. Test a standard Gemma 4 MLX model

This is the most informative next step.

Try a more standard Gemma 4 MLX model, such as mlx-community/gemma-4-26b-a4b-it-4bit, which a recent writeup says now runs in LM Studio after runtime updates. (Qiita)

This gives you a clean diagnostic split:

  • If a standard Gemma 4 MLX model loads, but your JANG model fails, then your problem is almost certainly checkpoint-format compatibility. (Qiita)
  • If standard Gemma 4 MLX models also fail, then your problem is broader: app/runtime versions, environment mismatch, or incomplete Gemma 4 support on your current stack. (GitHub)

3. For this JANG model, use the runtime it was built around: vMLX

This is the most likely actual solution for this model family.

The model page explicitly recommends vMLX, and the GGUF conversion page says the original JANG mixed-precision MLX format is only compatible with vMLX while standard tools cannot load it. That makes vMLX the natural first choice for the original JANG checkpoint. (Hugging Face)

4. If you want to stay in LM Studio, use the GGUF conversion instead of the original JANG MLX checkpoint

The GGUF conversion exists specifically because the original JANG MLX format is not broadly compatible. The conversion page says it provides standard GGUF quantizations for use with llama.cpp, LM Studio, Ollama, and other GGUF-compatible engines. So if LM Studio is your preferred app, the GGUF path is likely the smoother path than trying to force the original JANG MLX checkpoint to work there. (Hugging Face)

5. Delete the local model folder and re-download it cleanly

This is a good hygiene step.

If your local copy is stale or mixed, redownloading fixes that. It may not be the root cause, but it is easy to rule out and worth doing before deeper debugging. The model repo has had recent updates, including README and capability metadata changes, so a clean snapshot is safer than relying on an older local cache. (Hugging Face)

6. If you are loading from Python, update MLX packages

If you are not in LM Studio and instead use Python directly, make sure mlx-lm is recent enough to include Gemma 4 support and the Gemma 4 quantized per-layer projection loading fix. Those fixes are in the release notes, so older installations are a real risk. (GitHub)

A simple decision tree

Here is the beginner-safe version:

Case A: official MLX Gemma 4 works, JANG MLX fails
That means your app can handle Gemma 4 in general, but not this custom checkpoint format. Use vMLX for the JANG model, or use the GGUF version in LM Studio. (Qiita)

Case B: official MLX Gemma 4 also fails
That means your Gemma 4 support is still not correct at the runtime/app level. Update LM Studio + runtimes, or update your MLX Python packages. (GitHub)

Case C: everything still fails after updating
Then do a clean re-download of the model files and retest. If the JANG model still fails but official models load, the answer is still “wrong runtime for this checkpoint.” (Hugging Face)

My bottom line

The clearest explanation is:

Your MLX model will not load because the checkpoint format and the loader are mismatched, and this is especially likely because you are using a JANG v2 mixed-precision Gemma 4 checkpoint that is meant for vMLX rather than a standard MLX loader path. (GitHub)

So the practical fix order is:

  1. update LM Studio and runtimes if you use LM Studio, (LM Studio)
  2. test a standard mlx-community Gemma 4 MLX model, (Qiita)
  3. use vMLX for the original JANG MLX checkpoint, (Hugging Face)
  4. or use the GGUF conversion if you want LM Studio compatibility, (Hugging Face)
  5. and re-download the model cleanly to rule out cache issues. (Hugging Face)