AIOS: CPU-Native LLM Inference Architecture — Seeking Validation Contributors

acasavaraju · March 26, 2026, 2:50am

I’ve published a framework paper proposing a CPU-native inference
architecture for large language models.

Core argument: LLMs are slow on CPU not because CPUs are unsuited
to inference, but because models and runtimes were designed for GPU
memory architecture and never redesigned for CPU cache hierarchy.
AIOS proposes a memory residency controller and Model Contract to
close that gap.

What AIOS is:

A runtime (memory residency controller) between inference engines
and hardware — reducing DRAM data movement per generated token
A Model Contract — five architectural requirements models can
satisfy to expose the full optimization surface

Current state: Paper published, spec complete, validation tooling
runnable. Runtime not yet implemented. All performance projections
are analytical — no empirical results exist yet.

What I need most:
Someone with bare metal Linux (Intel Haswell+ or AMD Zen+, 16GB RAM)
to run the Phase 1 baseline measurement on Falcon 7B Q4_K_M using
stock llama.cpp. Full protocol in Issue #2. Takes ~2 hours including
setup.

Links:

HuggingFace: aios-framework/aios-paper · Hugging Face
Issue #2 (start here): Falcon 7B + AIOS: measure baseline MB/token (primary validation) · Issue #2 · acasavaraju/AIOS · GitHub

reimorster · March 26, 2026, 5:51am

lets do it. archlinux on intel ultra 7 265k with 64 gb ram

acasavaraju · March 26, 2026, 12:31pm

That’s perfect hardware for this — Intel Ultra 7 265K will have full uncore PMU counter access. Please head to Issue #2 on the GitHub repo for the full protocol: https://github.com/acasavaraju/AIOS/issues/2

The key steps: build llama.cpp, download Falcon 7B Q4_K_M, run baseline.py with --runs 5. Post the full JSON output as a comment on the issue. Looking forward to the first real number.

reimorster · March 26, 2026, 6:38pm

I’m building llama.cpp right now to run the validation.

Topic		Replies	Views
AIOS — First Ground Truth Baseline (CPU DRAM Measurement) Awesome paper	4	90	April 13, 2026
Anubis OSS - Local LLM Benchmarking for Apple Silicon with Real-Time Hardware Telemetry (Looking for Testers + Open Data) Show and Tell	0	277	February 25, 2026
Anubis OSS — native macOS app for benchmarking local LLMs with real-time hardware telemetry (free, open source) Intermediate	1	145	February 11, 2026
Llama 2 70B on a cpu Beginners	2	7029	August 23, 2023
What should i change to optimize local hosted AI Beginners	1	65	May 30, 2026

AIOS: CPU-Native LLM Inference Architecture — Seeking Validation Contributors

Related topics