GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53
How to use open-gigaai/GigaBrain-0.1-3.5B-Base with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("open-gigaai/GigaBrain-0.1-3.5B-Base", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]GigaBrain-0 is a world model-powered Vision-Language-Action (VLA) foundation model designed for robots. It leverages diverse, scalable data generated by world models, reducing reliance on costly real-world robot data while enhancing cross-task generalization. With innovations like RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, GigaBrain-0 excels in spatial reasoning, object state understanding, and long-horizon task execution. It supports dexterous manipulation, mobile tasks, and long-horizon planning, offering robust performance across diverse environments and conditions.
@article{team2025gigabrain,
title={GigaBrain-0: A World Model-Powered Vision-Language-Action Model},
author={GigaAI},
year={2025},
eprint={2510.19430},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.19430},
}