Hyundai Uiwang — Diffusion Policy (FlowMatch)

LeRobot Diffusion Policy trained on the Hyundai Uiwang left-arm manipulation dataset (131 episodes / 110,568 frames, 30 Hz) using the FlowMatch rectified-flow scheduler (num_inference_steps=1).

Inputs / Outputs

	key	shape	notes
in	`observation.images.front_rgb`	(H, W, 3) uint8	scene (zivid) view; resized internally to 240×320
in	`observation.images.wrist_rgb`	(H, W, 3) uint8	wrist view; resized internally
in	`observation.state`	(26,) float32	arm 6 + hand 20 joints
out	`action`	(26,) float32	target arm 6 + hand 20 joints, 30 Hz

Quick start

Runs from this model id alone — no dataset, no robot, no local checkpoint needed (normalization stats are bundled in the repo). The runnable script is included in this repo: inference_example.py.

pip install lerobot
# download the example from this repo and run it:
huggingface-cli download Ngseo/hyundai-uiwang-left-flowmatch inference_example.py --local-dir .
python inference_example.py --model-id Ngseo/hyundai-uiwang-left-flowmatch --device cuda  # or cpu / mps

Porting to your robot

import numpy as np, torch
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from lerobot.policies.factory import make_pre_post_processors

model_id = "Ngseo/hyundai-uiwang-left-flowmatch"
device = "cuda"

policy = DiffusionPolicy.from_pretrained(model_id)
policy.config.device = device           # saved config pins cuda; align to runtime device
policy.to(device).eval(); policy.reset()
preprocess, postprocess = make_pre_post_processors(
    policy.config, model_id, preprocessor_overrides={"device_processor": {"device": device}}
)

@torch.no_grad()
def predict(front_rgb, wrist_rgb, state):  # uint8 (H,W,3), uint8 (H,W,3), float32 (26,)
    obs = {
        "observation.images.front_rgb": torch.from_numpy(front_rgb).float().div(255).permute(2,0,1)[None].to(device),
        "observation.images.wrist_rgb": torch.from_numpy(wrist_rgb).float().div(255).permute(2,0,1)[None].to(device),
        "observation.state": torch.from_numpy(state).float()[None].to(device),
        "task": "", "robot_type": "",
    }
    action = postprocess(policy.select_action(preprocess(obs)))
    return action.squeeze(0).float().cpu().numpy()   # (26,)

The policy keeps an internal queue (n_obs_steps=2, n_action_steps=8); call predict at your control rate and policy.reset() between episodes. Camera input resolution need not match training (the policy resizes/crops internally), but the two views must be the right cameras (front vs wrist).

Training

Policy: diffusion, noise_scheduler_type=FlowMatch, num_inference_steps=1
Backbone: resnet18, 200k steps, batch 64, images resized 240×320 + crop 216×288
See train_config.json for the full configuration.

Downloads last month: 65

Safetensors

Model size

0.3B params

Tensor type

F32

Video Preview

Robotics