FlowHeads-DiffusionPolicy-PickOrange (DP-FlowHead)

DP-FlowHead — a LeRobot Diffusion-Policy conv-UNet (ResNet-18 + SpatialSoftmax + FiLM 1D-conv UNet, ~267M) whose DDPM denoising head is replaced by a rectified-flow head: predict the straight-line velocity v = x1 − x0 and sample by Euler ODE integration (NFE=10). Trained from scratch on LeIsaac SO-101 PickOrange (single arm, 2 RGB cams, 60 demos). This ckpt = step-9800 ≈ 4.3 epoch (best).

针对 LeIsaac SO-101 PickOrange 从头训练的 LeRobot 策略:在 Diffusion-Policy 的 conv-UNet 骨干上,把 DDPM 去噪头换成 rectified-flow 头 (速度场 v=x1−x0 + Euler 10 步积分)。本 ckpt = **step-9800 ≈ 4.3 epoch(best)**。

Closed-loop demo in Isaac Sim — SO-101 picking oranges into the plate.

🔗 Project repos / 项目仓库

vitorcen/FlowHeads — the flow-matching action-head umbrella (this model's code: flowdp/)
vitorcen/LeIsaac-Training — the LeIsaac PickOrange benchmark + eval harness
vitorcen/isaaclab-experience — Isaac Lab multi-policy umbrella (parent project)

Results — strict 20-round (PickOrange, closed-loop Isaac)

Headed, 20 rounds (60 episodes), EPISODE_LENGTH_S=120, MAX_ROUND_WALL_S=180, h=8.

metric	value
E(🍊)/ep	1.35 / 3 = 45.0 % (27/60)
P(3) — full round (all 3)	20 % (4/20)
P(≥2)	40 %
avg episode	171 s
20-ep raw	`[3,1,2,3,1,2,1,0,3,1,0,2,2,1,1,1,3,0,0,0]`

45.0 % ties the strongest baseline ACT (43.3 %) on this 60-demo task (a single 20-round run carries ±~9 %, so this is a tie, not a win). Notably, an earlier 5-round quick eval with a tight 90 s wall-cap had mis-scored this policy near zero: DP-FlowHead is slow (successful episodes take ~136 s) and the 90 s cap truncated its completions. The strict 180 s eval is what reveals the real 45 %. See the FlowHeads architecture × objective study for the full {conv-UNet, DiT} × {DDPM, flow} matrix.

Usage

from lerobot.policies.factory import get_policy_class
# requires the FlowHeads package on PYTHONPATH so "flowdp" is registered:
#   pip install -e .  # from https://github.com/vitorcen/FlowHeads
policy = get_policy_class("flowdp").from_pretrained("wsagi/FlowHeads-DiffusionPolicy-PickOrange")

Config: type=flowdp, n_action_steps=8, num_inference_steps=10 (Euler NFE). Closed-loop scoring uses the LeIsaac benchmark harness (scripts/benchmark/run_one.sh, eval-side lerobot 0.4.x).

Downloads last month: 28

Safetensors

Model size

0.3B params

Tensor type

F32

Video Preview

Robotics

Dataset used to train wsagi/FlowHeads-DiffusionPolicy-PickOrange

Collection including wsagi/FlowHeads-DiffusionPolicy-PickOrange

LeIsaac PickOrange

Collection

SO-101 single-arm pick-orange-and-place benchmark — same task, many policy families (strict 20-round eval). • 17 items • Updated 4 days ago