Running on CPU Upgrade 246 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens π 246 Explore synthetic data benchmarks via an interactive bookshelf
Running Featured 49 Porting nanochat to Transformers: an AI modeling history lesson π 49 Learn about ML and Transformers through nanochat
Running 94 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks π 94 Evaluate multilingual models using FineTasks
Running 225 FineVision: Open Data is All You Need π 225 A new open-source dataset for training VLMs