Last night I successfully trained SAE weights for all 30 layers of HuggingFaceTB/SmolLM2-135M-Instruct without adjusting a single hyperparameter. This was all automated, required no supervision.
Using an A100-40GB GPU I maintained ~400 tok/s throughput and was finished in 2 hours and 33 minutes! I ended the run off with 0 dead features across all layers. L0 convergence was set at 50 and mean Explained Variance of 0.9519 with the ceiling converging at 0.9905!
SAE weights and full model SQLite database live now!
juiceb0xc0de/smollm2-135m-instruct-SAE
juiceb0xc0de/smollm2-135m-instruct-SAE