pi0.5 LoRA — sim stack (merged flat + same, 60 traj), discretized state

LoRA SFT of π0.5 (pi05_base warm-start) on a merged simulated block-stacking dataset: pick the bottom item out of a stack and move it into a goal sphere without disturbing the items resting on top. Trained with the SENTINEL-Lite sentinel/openpi_sft pipeline (openpi-native trainer) on a single A100-40GB.

Dataset

This model is trained on the concatenation of two private LeRobot v2.1 datasets (60 trajectories / 48 208 frames total), built into one local merged dataset for training:

Source dataset (private)	Traj	Frames	Task index	Language prompt
`IDEAS-Lab-Northwestern/sim-stack-flat-30-libero`	30	25 341	0	"Pick up the flat object from under the stack and move it into the green goal sphere. Take care that the items resting on top remain stable and undisturbed."
`IDEAS-Lab-Northwestern/sim-stack-same-30-libero`	30	22 867	1	"Pick up the bottom item from the stack and move it into the green goal sphere. Take care that the items above remain stable and undisturbed."
merged	60	48 208	0 + 1	(both, multi-task)

Robot: Franka Panda, 30 fps, LeRobot v2.1, simulated in OmniGibson / BEHAVIOR-1K.
Two variants of the same skill family: flat = the target is a flat slab under a stack; same = the target is the bottom item of a same-shape stack. Merging exposes the policy to both during a single SFT.
The merge concatenates episodes (flat → episodes 0–29, same → 30–59), re-indexes the global frame index, and keeps both task prompts (task_index 0 = flat, 1 = same), so prompt_from_task selects the right instruction per episode.

Observation / action schema (LIBERO 2-cam)

Field	Shape	Maps to
`image`	256×256×3	`base_0_rgb` (third-person)
`wrist_image`	256×256×3	`left_wrist_0_rgb`
(no right wrist)	—	`right_wrist_0_rgb` zero-padded, mask off
`state`	8	proprioception (discretized — see below)
`actions`	7	6-DoF end-effector delta + gripper

Training recipe

Base / warm-start: pi05_base (gs://openpi-assets/checkpoints/pi05_base).
LoRA: gemma_2b_lora backbone (rank 16) + gemma_300m_lora action expert (rank 32), α = rank, no dropout. EMA off.
Discretized proprio state: discrete_state_input = True — the model consumes the 8-dim state as discretized tokens (the π0.5 default). ⚠️ This differs from the single-task siblings pi05-sim-stack-flat-30-libero-lora and pi05-sim-stack-same-30-libero-lora, which were trained with continuous state (discrete_state_input = False).
Optimizer / schedule: batch 4, 40 000 steps (~3.3 epochs over 48 208 frames). Cosine LR, peak 2.5e-5 → floor 2.5e-6, 1 000-step warmup, decaying across the full run.
Norm stats: computed on the merged dataset (π0.5-base ships none).
Hardware: 1× A100-40GB, ~5.3 h. Final train loss ≈ 0.02.

Checkpoints

Folder	Step	Note
`8000`	8 000
`16000`	16 000
`24000`	24 000
`32000`	32 000
`40000`	40 000	final (openpi 0-indexed `39999`, relabeled)

Single uninterrupted run — step numbers are absolute (not resumed/cumulative across runs). Each folder holds the LoRA params/ and the dataset assets/ (norm stats); optimizer train_state/ is intentionally not uploaded.

Pick the eval checkpoint by eval metrics, not train loss — late checkpoints have near-identical train loss.

Usage

Load with openpi's pi05 config (LIBERO 2-cam transforms, discrete_state_input=True), pointing the weight loader at one checkpoint's params/. See the SENTINEL-Lite sentinel/openpi_sft pipeline for the exact TrainConfig (pi05_stack_merged60_discrete_libero_lora).

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics