[Project] QitOS: A research-first framework for building and evaluating LLM agents

[Project] QitOS: A research-first framework for building and evaluating LLM agents

Hey everyone,

I wanted to share QitOS, a new framework I’ve been working on that’s built specifically for LLM agent researchers.

After working on several agent projects, I found that most existing frameworks didn’t really fit the research workflow:

  • It was too hard to quickly iterate on new agent architectures without rewriting the entire execution stack
  • Strategy (how the agent thinks) and execution (tool calling, tracing, evaluation) were always tangled together
  • Getting set up to evaluate on standard benchmarks took way longer than the actual research
  • Debugging agent trajectories was a mess without proper tooling

QitOS was built to solve all these problems:

Key Features

  • Clean architecture: Separation between AgentModule (your strategy/innovation) and Engine (orchestration, tool execution, tracing). You focus on the research, the framework handles the rest.
  • Research-friendly: Supports all common agent patterns out of the box: ReAct, Plan-Act, Tree-of-Thought, Reflexion, and makes it extremely easy to implement custom scaffolds.
  • Benchmark-native: Built-in adapters for GAIA, Tau-Bench, and CyBench so you can get your evaluation up and running in minutes.
  • Great observability: The qita CLI lets you browse, inspect, replay, and export full agent trajectories — no more digging through raw log files.
  • Ecosystem compatible: Works naturally with any OpenAI-compatible model API, so you can use whatever models you prefer.

Minimal Example

Here’s what a minimal SWE agent looks like in QitOS:

from dataclasses import dataclass, field
from qitos import AgentModule, Engine, Task, ToolRegistry
from qitos.kit.parser import ReActTextParser
from qitos.kit.tool import EditorToolSet, RunCommand

@dataclass
class SWEState(StateSchema):
    scratchpad: list[str] = field(default_factory=list)

class MySWEAgent(AgentModule[SWEState, ...]):
    def __init__(self, llm, workspace_root):
        reg = ToolRegistry()
        reg.include(EditorToolSet(workspace_root))
        reg.register(RunCommand(workspace_root))
        super().__init__(
            tool_registry=reg,
            llm=llm,
            model_parser=ReActTextParser()
        )
    
    # Implement your strategy logic here...

# Run it
agent = MySWEAgent(llm=my_llm, workspace_root="./playground")
result = Engine(agent=agent).run(my_task)
print(result.state.final_result)

Get Started

I’m really interested to hear what the community thinks — what do you find most frustrating about building LLM agents for research? Are there any features you’d like to see added to QitOS?

All feedback and contributions are very welcome!

1 Like