MVP Lab

community

https://mvp-lab.ai

AI & ML interests

multi-modal foundation models

Recent Activity

oliveryanzuolu updated a collection about 1 hour ago

oliveryanzuolu submitted a paper about 2 hours ago

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

oliveryanzuolu updated a model about 2 hours ago

View all activity

Papers

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

View all Papers

posted an update about 1 hour ago

Post

3

Excited to share RAVEN, my first PhD project. Paper, code, and models are all released.

RAVEN is for real-time autoregressive video generation. Instead of simply appending future chunks, we train the model to better remember and use its own generated history, leading to more realistic and natural long-horizon videos.

Technically, RAVEN repacks self-rollouts into interleaved clean historical endpoints and noisy denoising states, aligning training-time attention with inference-time extrapolation.

We also introduce CM-GRPO: by reformulating consistency-model sampling as a conditional Gaussian transition kernel, online RL can directly optimize the sampler transition used at inference.

Project Page: https://yanzuo.lu/raven
Paper: https://arxiv.org/abs/2605.15190
Code: https://github.com/mvp-ai-lab/RAVEN
Model: mvp-lab/RAVEN

updated a collection about 1 hour ago

RAVEN

Real-time Autoregressive Video Extrapolation with Consistency-model GRPO • 2 items • Updated about 1 hour ago

submitted a paper to Daily Papers about 2 hours ago

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Paper • 2605.15190 • Published 1 day ago • 4

updated a model about 2 hours ago

mvp-lab/RAVEN

Text-to-Video • Updated about 2 hours ago

updated a collection about 14 hours ago

RAVEN

Real-time Autoregressive Video Extrapolation with Consistency-model GRPO • 2 items • Updated about 1 hour ago

updated a dataset 4 days ago

mvp-lab/LLaVA-OneVision-2-Data

Viewer • Updated 4 days ago • 24 • 92.3k • 12

updated a collection 9 days ago

LLaVA-OneVision-2

1 item • Updated 9 days ago • 2

geoffreychen777

updated a dataset about 1 month ago

mvp-lab/mvp-engine-minimal-vlm-data

Viewer • Updated Apr 1 • 503 • 90

geoffreychen777

published a dataset about 1 month ago

mvp-lab/mvp-engine-minimal-vlm-data

Viewer • Updated Apr 1 • 503 • 90

published 2 datasets about 1 month ago

mvp-lab/OpenVidHD-0.4M-720p-48fps

Viewer • Updated Dec 5, 2025 • 433k • 2.5k

mvp-lab/Sekai

Viewer • Updated Nov 26, 2025 • 1.83M • 70

published a dataset about 2 months ago

mvp-lab/LLaVA-OneVision-2-Data

Viewer • Updated 4 days ago • 24 • 92.3k • 12

updated a collection 2 months ago

Vton

0 items • Updated 1 day ago

authored a paper 3 months ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

submitted a paper to Daily Papers 3 months ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

authored 2 papers 3 months ago

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

Paper • 2510.18795 • Published Oct 21, 2025 • 11

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Paper • 2601.10305 • Published Jan 15 • 36

authored 2 papers 7 months ago

ForCenNet: Foreground-Centric Network for Document Image Rectification

Paper • 2507.19804 • Published Jul 26, 2025 • 12

Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval

Paper • 2509.09118 • Published Sep 11, 2025 • 8