Instructions to use luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit

Run Hermes

hermes

MLX LM

How to use luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "luoyike2003/LongShuGameDev-Qwen3.5-122B-REAP-Architect-MLX-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

LongShu · Architect-V2.2

A multi-agent collaboration core brain specifically crafted for game development, based on Qwen3.5-122B-A10B-MoE architecture with deep optimization, running smoothly on consumer-grade hardware. github：https://github.com/luoyike2003ls/LongShuGameDev

🎮 Model Positioning

Traditional general-purpose LLMs often suffer from severe "hallucinations" or provide superficial suggestions when facing complex system architectures, deep engine APIs (Unreal/Unity), and state machine logic in real game development scenarios. LongShu aims to solve this pain point. Based on the powerful Qwen3.5-122B-A10B-MoE foundation, we've performed deep "surgical" domain-specific fine-tuning to create the core brain of this Multi-Agent System — the Commander Model (TIANCE). It doesn't just understand code; it understands game engineering. It's no longer a chatbot—it's a true "Technical Director + Lead Architect".

🏗️ REAP Ecosystem

LongShu is the central brain of a complete game development agent network: | Role | Codename | Positioning | Core Capabilities | |------|----------|-------------|-------------------| | Commander | Tiance | Core brain, logic reasoning hub | Global planning, system decomposition, task dispatch | | Architect | Xuangou | Code architecture expert | Tech structure analysis, architecture optimization, tech debt identification | | Executor | Moxing | Task execution specialist | Coding, debugging, test case generation | | Watcher | Zhuzhao | Monitoring & alerting expert | Log analysis, anomaly detection, risk early warning | | Scholar | Wenyuan | Knowledge management expert | Documentation understanding, knowledge graphs, intelligent retrieval | | Coordinator | Hengshu | Team collaboration expert | Intelligent task allocation, cross-functional coordination |

⚡ Core Technical Highlights

Hybrid Attention Architecture

60-layer network with 3:1 alternating Linear + Full Attention
Balances lightning-fast linear inference with full attention precision

Long Context Support (256K)

Native support for 262,144 token context
Can ingest entire project headers, design docs, and API documentation in one shot

Extreme MoE Sparsity

105 experts, only 10 activated per token
High compute efficiency, inference speed of ~36 tokens/s (Mac mini M4 Pro 64GB)

Game Engine-Aware Hybrid Quantization

Core 20 experts: high-precision IQ4_NL/Q5_K
Non-core 85 experts: extreme compression IQ2_XXS
Massive model size reduction while retaining 98.6% core reasoning capability

🎯 Specialized Training Data

Data Type	Scale	Description
Real Game Projects	52+	MMO, FPS, ARPG, Roguelike genres
Core Source Code	2.1B+ Tokens	UE (C++), Unity (C#), Godot, Lua hot-reload frameworks
Engineering Docs	305K+ Pages	GDDs, system breakdowns, game design logic, performance analysis
High-Quality Online Data	10.2B+	StackOverflow gamedev, GitHub Issues, graphics papers

💻 Hardware Requirements

Configuration	Recommendation
Mac	M2/M3/M4 series, 64GB Unified Memory
PC	Dual RTX 3090/4090 (24GB)
Format	llama.cpp IQ3S extreme compression
Speed	~36 tokens/s (Mac mini M4 Pro)

🚀 Use Cases

Automated Test Case Generation - Auto-generate test plans based on code logic
Daily Build Error Diagnosis - Analyze compile/runtime errors with fix suggestions
Level Toolchain Dispatch - Understand design requirements, dispatch executors
System Architecture Design - Decompose complex requirements into modular architectures
Code Review - Review code quality, identify potential issues

🎮 Real-World Application

Sakura Dream Sea (樱梦海) (Steam Page) — An Eastern Fantasy Open-World MMO Adventure The Sakura Dream Sea development team serves as one of the core pilot users of the LongShu model, deeply integrating the complete LongShu agent capabilities into their game development pipeline:

Leveraging the Tiance/Commander Model for global task planning and module decomposition
Utilizing Xuangou/Architect for game system architecture optimization and code review
Accelerating core gameplay development (combat systems, AI behavior trees) through Moxing/Executor
Monitoring server performance and anomaly logs with Zhuzhao/Watcher This open-world MMO set in the "Eternal Sakura Continent" represents the best practice validation of LongShu large models in game industrial production. As you embark on this fantasy continent adorned with falling cherry blossoms, adventuring alongside those "breathing AI companions," it's the LongShu agent network powering the intelligent operation of the entire game world.

📄 License

Apache 2.0 License

LongShu · AI-Powered Partner for Game Development

Downloads last month: 1,231

Safetensors

Model size

97B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit