pi0.5 LoRA β€” sim stack (merged flat + same, 60 traj), discretized state

LoRA SFT of Ο€0.5 (pi05_base warm-start) on a merged simulated block-stacking dataset: pick the bottom item out of a stack and move it into a goal sphere without disturbing the items resting on top. Trained with the SENTINEL-Lite sentinel/openpi_sft pipeline (openpi-native trainer) on a single A100-40GB.

Dataset

This model is trained on the concatenation of two private LeRobot v2.1 datasets (60 trajectories / 48 208 frames total), built into one local merged dataset for training:

Source dataset (private) Traj Frames Task index Language prompt
IDEAS-Lab-Northwestern/sim-stack-flat-30-libero 30 25 341 0 "Pick up the flat object from under the stack and move it into the green goal sphere. Take care that the items resting on top remain stable and undisturbed."
IDEAS-Lab-Northwestern/sim-stack-same-30-libero 30 22 867 1 "Pick up the bottom item from the stack and move it into the green goal sphere. Take care that the items above remain stable and undisturbed."
merged 60 48 208 0 + 1 (both, multi-task)
  • Robot: Franka Panda, 30 fps, LeRobot v2.1, simulated in OmniGibson / BEHAVIOR-1K.
  • Two variants of the same skill family: flat = the target is a flat slab under a stack; same = the target is the bottom item of a same-shape stack. Merging exposes the policy to both during a single SFT.
  • The merge concatenates episodes (flat β†’ episodes 0–29, same β†’ 30–59), re-indexes the global frame index, and keeps both task prompts (task_index 0 = flat, 1 = same), so prompt_from_task selects the right instruction per episode.

Observation / action schema (LIBERO 2-cam)

Field Shape Maps to
image 256Γ—256Γ—3 base_0_rgb (third-person)
wrist_image 256Γ—256Γ—3 left_wrist_0_rgb
(no right wrist) β€” right_wrist_0_rgb zero-padded, mask off
state 8 proprioception (discretized β€” see below)
actions 7 6-DoF end-effector delta + gripper

Training recipe

  • Base / warm-start: pi05_base (gs://openpi-assets/checkpoints/pi05_base).
  • LoRA: gemma_2b_lora backbone (rank 16) + gemma_300m_lora action expert (rank 32), Ξ± = rank, no dropout. EMA off.
  • Discretized proprio state: discrete_state_input = True β€” the model consumes the 8-dim state as discretized tokens (the Ο€0.5 default). ⚠️ This differs from the single-task siblings pi05-sim-stack-flat-30-libero-lora and pi05-sim-stack-same-30-libero-lora, which were trained with continuous state (discrete_state_input = False).
  • Optimizer / schedule: batch 4, 40 000 steps (~3.3 epochs over 48 208 frames). Cosine LR, peak 2.5e-5 β†’ floor 2.5e-6, 1 000-step warmup, decaying across the full run.
  • Norm stats: computed on the merged dataset (Ο€0.5-base ships none).
  • Hardware: 1Γ— A100-40GB, ~5.3 h. Final train loss β‰ˆ 0.02.

Checkpoints

Folder Step Note
8000 8 000
16000 16 000
24000 24 000
32000 32 000
40000 40 000 final (openpi 0-indexed 39999, relabeled)

Single uninterrupted run β€” step numbers are absolute (not resumed/cumulative across runs). Each folder holds the LoRA params/ and the dataset assets/ (norm stats); optimizer train_state/ is intentionally not uploaded.

Pick the eval checkpoint by eval metrics, not train loss β€” late checkpoints have near-identical train loss.

Usage

Load with openpi's pi05 config (LIBERO 2-cam transforms, discrete_state_input=True), pointing the weight loader at one checkpoint's params/. See the SENTINEL-Lite sentinel/openpi_sft pipeline for the exact TrainConfig (pi05_stack_merged60_discrete_libero_lora).

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading