--- library_name: lerobot pipeline_tag: robotics tags: - lerobot - diffusion-policy - flow-matching - robotics - manipulation --- # Hyundai Uiwang — Diffusion Policy (FlowMatch) LeRobot Diffusion Policy trained on the Hyundai Uiwang **left**-arm manipulation dataset (131 episodes / 110,568 frames, 30 Hz) using the **FlowMatch** rectified-flow scheduler (`num_inference_steps=1`). ## Inputs / Outputs | | key | shape | notes | |---|---|---|---| | in | `observation.images.front_rgb` | (H, W, 3) uint8 | scene (zivid) view; resized internally to 240×320 | | in | `observation.images.wrist_rgb` | (H, W, 3) uint8 | wrist view; resized internally | | in | `observation.state` | (26,) float32 | arm 6 + hand 20 joints | | out | `action` | (26,) float32 | target arm 6 + hand 20 joints, 30 Hz | ## Quick start Runs from this model id alone — no dataset, no robot, no local checkpoint needed (normalization stats are bundled in the repo). The runnable script is included in this repo: [`inference_example.py`](./inference_example.py). ```bash pip install lerobot # download the example from this repo and run it: huggingface-cli download Ngseo/hyundai-uiwang-left-flowmatch inference_example.py --local-dir . python inference_example.py --model-id Ngseo/hyundai-uiwang-left-flowmatch --device cuda # or cpu / mps ``` ## Porting to your robot ```python import numpy as np, torch from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy from lerobot.policies.factory import make_pre_post_processors model_id = "Ngseo/hyundai-uiwang-left-flowmatch" device = "cuda" policy = DiffusionPolicy.from_pretrained(model_id) policy.config.device = device # saved config pins cuda; align to runtime device policy.to(device).eval(); policy.reset() preprocess, postprocess = make_pre_post_processors( policy.config, model_id, preprocessor_overrides={"device_processor": {"device": device}} ) @torch.no_grad() def predict(front_rgb, wrist_rgb, state): # uint8 (H,W,3), uint8 (H,W,3), float32 (26,) obs = { "observation.images.front_rgb": torch.from_numpy(front_rgb).float().div(255).permute(2,0,1)[None].to(device), "observation.images.wrist_rgb": torch.from_numpy(wrist_rgb).float().div(255).permute(2,0,1)[None].to(device), "observation.state": torch.from_numpy(state).float()[None].to(device), "task": "", "robot_type": "", } action = postprocess(policy.select_action(preprocess(obs))) return action.squeeze(0).float().cpu().numpy() # (26,) ``` The policy keeps an internal queue (`n_obs_steps=2`, `n_action_steps=8`); call `predict` at your control rate and `policy.reset()` between episodes. Camera input resolution need not match training (the policy resizes/crops internally), but the two views must be the right cameras (front vs wrist). ## Training - Policy: `diffusion`, `noise_scheduler_type=FlowMatch`, `num_inference_steps=1` - Backbone: resnet18, 200k steps, batch 64, images resized 240×320 + crop 216×288 - See `train_config.json` for the full configuration.