AI-native OS orchestration with local/open model backends (Ollama, llama.cpp, API)

I’m building MindOs, an open-source AI-native OS prototype, and I’m trying to solve a systems problem rather than just a model problem. The idea is to keep one deterministic orchestrator with memory, policy gating, trust/sandbox, and audit at the core, while letting the model layer stay flexible.

What I care about most is being able to run open models in different real-world setups without changing the operating logic every time. In practice, MindOs can already run with local backends like Ollama and llama.cpp, or through API-compatible endpoints, from the same runtime configuration. So the “mind” of the system stays consistent, while the model backend can change based on hardware, privacy, latency, or cost.

I’d really like feedback from people who have experience running heterogeneous open-model stacks. I’m especially interested in what tends to break first when you try to keep one orchestration contract across different model runtimes, and what design choices make that actually sustainable over time.

Repo: https://github.com/rthgit/MindOs

If there’s interest, I can post a compact architecture note focused only on backend interoperability and model routing tradeoffs.

1 Like