How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for openenv-community/replicalab-scientist-grpo-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for openenv-community/replicalab-scientist-grpo-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for openenv-community/replicalab-scientist-grpo-lora to start chatting
Load model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="openenv-community/replicalab-scientist-grpo-lora",
    max_seq_length=2048,
)
Quick Links

ReplicaLab Scientist — GRPO LoRA Adapter

A LoRA adapter fine-tuned on unsloth/Qwen3.5-0.8B using Group Relative Policy Optimization (GRPO) for multi-agent scientific negotiation.

What is ReplicaLab?

ReplicaLab is a multi-agent constraint-aware planning environment that trains an AI Scientist agent to negotiate feasible scientific replication plans under realistic resource constraints. A Lab Manager enforces budgets, schedules, and equipment limits while a deterministic Judge scores every plan on rigor, feasibility, and fidelity.

Live demo: ayushozha-replicalab.hf.space

Training Details

  • Method: GRPO (Group Relative Policy Optimization) via TRL
  • Base model: unsloth/Qwen3.5-0.8B
  • LoRA config: rank=16, alpha=32, dropout=0.0
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Hardware: NVIDIA H100 80GB HBM3 (Northflank)
  • Steps: 200 (checkpoints at 100, 150, 200)
  • Training framework: Unsloth + TRL 0.24.0 + PEFT 0.18.1

Reward Formula

total_reward = 10 × rigor × feasibility × fidelity × parsimony
             + efficiency_bonus + communication_bonus − penalties

The multiplicative core prevents fake wins: a theoretically strong but impossible plan scores low.

Training Curves

Overview

Training Overview

Reward Over Training

Reward Curve

Training Loss

Loss Curve

KL Divergence

KL Divergence

Completion Length

Completion Length

Evaluation Results

Improvement Over Baseline

Improvements

Side-by-Side Comparison

Eval Comparison

Metric Baseline Scientist Trained Scientist Change
Average reward 4.25 7.10 +67%
Rounds to agreement 4.1 2.8 −32%
Invalid action rate 15% 4% −73%
Agreement rate 50% 80% +60%
Avg rigor score 0.55 0.72 +31%
Avg feasibility score 0.52 0.78 +50%
Avg fidelity score 0.58 0.71 +22%

Scenario Families

Template Domain Example Task
math_reasoning Mathematics Proof planning under deadline and review constraints
ml_benchmark Machine Learning Model replication with compute and time budgets
finance_trading Finance Backtest design under capital and risk limits

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.5-0.8B")
model = PeftModel.from_pretrained(base_model, "openenv-community/replicalab-scientist-grpo-lora")
tokenizer = AutoTokenizer.from_pretrained("openenv-community/replicalab-scientist-grpo-lora")

# Use within the ReplicaLab environment for scientific negotiation

Framework Versions

  • PEFT: 0.18.1
  • TRL: 0.24.0
  • Transformers: 5.2.0
  • PyTorch: 2.8.0+cu128
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

@misc{replicalab2026,
    title        = {ReplicaLab: Multi-Agent Constraint-Aware Planning for Scientific Replication},
    author       = {Ayush Ojha and Kian and Peixi and Kush},
    year         = 2026,
    url          = {https://github.com/Ayush10/replicalab-ai}
}

License

MIT

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for openenv-community/replicalab-scientist-grpo-lora

Adapter
(21)
this model

Spaces using openenv-community/replicalab-scientist-grpo-lora 3