Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
danielhanchenย 
posted an update 4 days ago

We just compressed Qwen 3.6's KV cache 4x with zero quality loss (PPL actually improves slightly).

Works automatically on the hybrid architecture โ€” detects standard vs linear attention layers.

Model card: huggingface.co/fraQtl/Qwen3.6-35B-A3B-fraQtl-kv :)

Are 27B and 122B coming soon?

p.s.: Qwen3.5 is starting to show promise... it's the first Qwen reasoning model that worked, imho, since QwQ -- the first not to reason too much or get stuck in loops too often. However, it still feels like it's not truly understanding; like it's just parrotting what it thinks a teacher model would say, even when that's not aligned with what the user requested. Hope this is part of the "quality" focus that was mentioned a while back.

ยท

oh 3.6 35B is a literal never ending reasoning loop for me. like 3 out 6 times need to kill the server type of deal