[Project] QitOS: A research-first framework for building and evaluating LLM agents
Hey everyone,
I wanted to share QitOS, a new framework I’ve been working on that’s built specifically for LLM agent researchers.
After working on several agent projects, I found that most existing frameworks didn’t really fit the research workflow:
- It was too hard to quickly iterate on new agent architectures without rewriting the entire execution stack
- Strategy (how the agent thinks) and execution (tool calling, tracing, evaluation) were always tangled together
- Getting set up to evaluate on standard benchmarks took way longer than the actual research
- Debugging agent trajectories was a mess without proper tooling
QitOS was built to solve all these problems:
Key Features
- Clean architecture: Separation between
AgentModule(your strategy/innovation) andEngine(orchestration, tool execution, tracing). You focus on the research, the framework handles the rest. - Research-friendly: Supports all common agent patterns out of the box: ReAct, Plan-Act, Tree-of-Thought, Reflexion, and makes it extremely easy to implement custom scaffolds.
- Benchmark-native: Built-in adapters for GAIA, Tau-Bench, and CyBench so you can get your evaluation up and running in minutes.
- Great observability: The
qitaCLI lets you browse, inspect, replay, and export full agent trajectories — no more digging through raw log files. - Ecosystem compatible: Works naturally with any OpenAI-compatible model API, so you can use whatever models you prefer.
Minimal Example
Here’s what a minimal SWE agent looks like in QitOS:
from dataclasses import dataclass, field
from qitos import AgentModule, Engine, Task, ToolRegistry
from qitos.kit.parser import ReActTextParser
from qitos.kit.tool import EditorToolSet, RunCommand
@dataclass
class SWEState(StateSchema):
scratchpad: list[str] = field(default_factory=list)
class MySWEAgent(AgentModule[SWEState, ...]):
def __init__(self, llm, workspace_root):
reg = ToolRegistry()
reg.include(EditorToolSet(workspace_root))
reg.register(RunCommand(workspace_root))
super().__init__(
tool_registry=reg,
llm=llm,
model_parser=ReActTextParser()
)
# Implement your strategy logic here...
# Run it
agent = MySWEAgent(llm=my_llm, workspace_root="./playground")
result = Engine(agent=agent).run(my_task)
print(result.state.final_result)
Get Started
- GitHub: GitHub - Qitor/qitos: Let's Qitos! A torch-like agent-native framework for researchers. · GitHub
- Documentation: Home - QitOS
- Install:
pip install qitos
I’m really interested to hear what the community thinks — what do you find most frustrating about building LLM agents for research? Are there any features you’d like to see added to QitOS?
All feedback and contributions are very welcome!