---
library_name: lerobot
pipeline_tag: robotics
tags:
  - lerobot
  - diffusion-policy
  - flow-matching
  - robotics
  - manipulation
---

# Hyundai Uiwang — Diffusion Policy (FlowMatch)

LeRobot Diffusion Policy trained on the Hyundai Uiwang **left**-arm manipulation dataset
(131 episodes / 110,568 frames, 30 Hz) using the **FlowMatch** rectified-flow scheduler
(`num_inference_steps=1`).

## Inputs / Outputs

| | key | shape | notes |
|---|---|---|---|
| in | `observation.images.front_rgb` | (H, W, 3) uint8 | scene (zivid) view; resized internally to 240×320 |
| in | `observation.images.wrist_rgb` | (H, W, 3) uint8 | wrist view; resized internally |
| in | `observation.state` | (26,) float32 | arm 6 + hand 20 joints |
| out | `action` | (26,) float32 | target arm 6 + hand 20 joints, 30 Hz |

## Quick start

Runs from this model id alone — no dataset, no robot, no local checkpoint needed
(normalization stats are bundled in the repo). The runnable script is included in this
repo: [`inference_example.py`](./inference_example.py).

```bash
pip install lerobot
# download the example from this repo and run it:
huggingface-cli download Ngseo/hyundai-uiwang-left-flowmatch inference_example.py --local-dir .
python inference_example.py --model-id Ngseo/hyundai-uiwang-left-flowmatch --device cuda  # or cpu / mps
```

## Porting to your robot

```python
import numpy as np, torch
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from lerobot.policies.factory import make_pre_post_processors

model_id = "Ngseo/hyundai-uiwang-left-flowmatch"
device = "cuda"

policy = DiffusionPolicy.from_pretrained(model_id)
policy.config.device = device           # saved config pins cuda; align to runtime device
policy.to(device).eval(); policy.reset()
preprocess, postprocess = make_pre_post_processors(
    policy.config, model_id, preprocessor_overrides={"device_processor": {"device": device}}
)

@torch.no_grad()
def predict(front_rgb, wrist_rgb, state):  # uint8 (H,W,3), uint8 (H,W,3), float32 (26,)
    obs = {
        "observation.images.front_rgb": torch.from_numpy(front_rgb).float().div(255).permute(2,0,1)[None].to(device),
        "observation.images.wrist_rgb": torch.from_numpy(wrist_rgb).float().div(255).permute(2,0,1)[None].to(device),
        "observation.state": torch.from_numpy(state).float()[None].to(device),
        "task": "", "robot_type": "",
    }
    action = postprocess(policy.select_action(preprocess(obs)))
    return action.squeeze(0).float().cpu().numpy()   # (26,)
```

The policy keeps an internal queue (`n_obs_steps=2`, `n_action_steps=8`); call `predict`
at your control rate and `policy.reset()` between episodes. Camera input resolution need not
match training (the policy resizes/crops internally), but the two views must be the right
cameras (front vs wrist).

## Training

- Policy: `diffusion`, `noise_scheduler_type=FlowMatch`, `num_inference_steps=1`
- Backbone: resnet18, 200k steps, batch 64, images resized 240×320 + crop 216×288
- See `train_config.json` for the full configuration.