pi0.5 LoRA β sim stack (merged flat + same, 60 traj), discretized state
LoRA SFT of Ο0.5 (pi05_base warm-start) on a merged simulated
block-stacking dataset: pick the bottom item out of a stack and move it into a
goal sphere without disturbing the items resting on top. Trained with the
SENTINEL-Lite sentinel/openpi_sft pipeline (openpi-native trainer) on a single
A100-40GB.
Dataset
This model is trained on the concatenation of two private LeRobot v2.1 datasets (60 trajectories / 48 208 frames total), built into one local merged dataset for training:
| Source dataset (private) | Traj | Frames | Task index | Language prompt |
|---|---|---|---|---|
IDEAS-Lab-Northwestern/sim-stack-flat-30-libero |
30 | 25 341 | 0 | "Pick up the flat object from under the stack and move it into the green goal sphere. Take care that the items resting on top remain stable and undisturbed." |
IDEAS-Lab-Northwestern/sim-stack-same-30-libero |
30 | 22 867 | 1 | "Pick up the bottom item from the stack and move it into the green goal sphere. Take care that the items above remain stable and undisturbed." |
| merged | 60 | 48 208 | 0 + 1 | (both, multi-task) |
- Robot: Franka Panda, 30 fps, LeRobot v2.1, simulated in OmniGibson / BEHAVIOR-1K.
- Two variants of the same skill family:
flat= the target is a flat slab under a stack;same= the target is the bottom item of a same-shape stack. Merging exposes the policy to both during a single SFT. - The merge concatenates episodes (flat β episodes 0β29, same β 30β59),
re-indexes the global frame index, and keeps both task prompts (
task_index0 = flat, 1 = same), soprompt_from_taskselects the right instruction per episode.
Observation / action schema (LIBERO 2-cam)
| Field | Shape | Maps to |
|---|---|---|
image |
256Γ256Γ3 | base_0_rgb (third-person) |
wrist_image |
256Γ256Γ3 | left_wrist_0_rgb |
| (no right wrist) | β | right_wrist_0_rgb zero-padded, mask off |
state |
8 | proprioception (discretized β see below) |
actions |
7 | 6-DoF end-effector delta + gripper |
Training recipe
- Base / warm-start:
pi05_base(gs://openpi-assets/checkpoints/pi05_base). - LoRA:
gemma_2b_lorabackbone (rank 16) +gemma_300m_loraaction expert (rank 32), Ξ± = rank, no dropout. EMA off. - Discretized proprio state:
discrete_state_input = Trueβ the model consumes the 8-dim state as discretized tokens (the Ο0.5 default). β οΈ This differs from the single-task siblingspi05-sim-stack-flat-30-libero-loraandpi05-sim-stack-same-30-libero-lora, which were trained with continuous state (discrete_state_input = False). - Optimizer / schedule: batch 4, 40 000 steps (~3.3 epochs over 48 208
frames). Cosine LR, peak
2.5e-5β floor2.5e-6, 1 000-step warmup, decaying across the full run. - Norm stats: computed on the merged dataset (Ο0.5-base ships none).
- Hardware: 1Γ A100-40GB, ~5.3 h. Final train loss β 0.02.
Checkpoints
| Folder | Step | Note |
|---|---|---|
8000 |
8 000 | |
16000 |
16 000 | |
24000 |
24 000 | |
32000 |
32 000 | |
40000 |
40 000 | final (openpi 0-indexed 39999, relabeled) |
Single uninterrupted run β step numbers are absolute (not resumed/cumulative
across runs). Each folder holds the LoRA params/ and the dataset assets/
(norm stats); optimizer train_state/ is intentionally not uploaded.
Pick the eval checkpoint by eval metrics, not train loss β late checkpoints have near-identical train loss.
Usage
Load with openpi's pi05 config (LIBERO 2-cam transforms,
discrete_state_input=True), pointing the weight loader at one checkpoint's
params/. See the SENTINEL-Lite sentinel/openpi_sft pipeline for the exact
TrainConfig (pi05_stack_merged60_discrete_libero_lora).