R PRO

juiceb0xc0de

https://github.com/JuiceB0xC0de

AI & ML interests

You can call me McDreamy M.D. I'll be your attending neural network brain surgeon.

Recent Activity

posted an update about 4 hours ago

I've been busy testing the event aware SAE trainer I've been designing. It uses hierarchical supervisory control for hands-off SAE training taking the pressure of having to tune hyperparameters per layer away. The trainer uses an Adaptive Lagrangian controller to drive L0 sparsity to target automatically, with Explained Variance as the convergence quality metric. Last night I successfully trained SAE weights for all 30 layers of HuggingFaceTB/SmolLM2-135M-Instruct without adjusting a single hyperparameter. This was all automated, required no supervision. Using an A100-40GB GPU I maintained ~400 tok/s throughput and was finished in 2 hours and 33 minutes! I ended the run off with 0 dead features across all layers. L0 convergence was set at 50 and mean Explained Variance of 0.9519 with the ceiling converging at 0.9905! SAE weights and full model SQLite database live now! https://huggingface.co/datasets/juiceb0xc0de/smollm2-135m-instruct-SAE https://huggingface.co/datasets/juiceb0xc0de/smollm2-135m-instruct-SAE

updated a dataset about 4 hours ago

juiceb0xc0de/smollm2-135m-instruct-SAE

published a dataset about 5 hours ago

juiceb0xc0de/smollm2-135m-instruct-SAE

View all activity

Organizations

I've been busy testing the event aware SAE trainer I've been designing. It uses hierarchical supervisory control for hands-off SAE training taking the pressure of having to tune hyperparameters per layer away. The trainer uses an Adaptive Lagrangian controller to drive L0 sparsity to target automatically, with Explained Variance as the convergence quality metric.

Last night I successfully trained SAE weights for all 30 layers of HuggingFaceTB/SmolLM2-135M-Instruct without adjusting a single hyperparameter. This was all automated, required no supervision.

Using an A100-40GB GPU I maintained ~400 tok/s throughput and was finished in 2 hours and 33 minutes! I ended the run off with 0 dead features across all layers. L0 convergence was set at 50 and mean Explained Variance of 0.9519 with the ceiling converging at 0.9905!

SAE weights and full model SQLite database live now!
juiceb0xc0de/smollm2-135m-instruct-SAE
juiceb0xc0de/smollm2-135m-instruct-SAE