How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning Paper • 2605.27310 • Published 6 days ago • 18
RiT: Vanilla Diffusion Transformers Suffice in Representation Space Paper • 2605.21981 • Published 11 days ago • 10
Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion Paper • 2603.14645 • Published Mar 15 • 5
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published Mar 13 • 149
Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models Paper • 2505.20152 • Published May 26, 2025 • 11
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning Paper • 2505.20046 • Published May 26, 2025 • 18
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? danaaubakirova, andito • Jul 25, 2024 • 17
Improving Text-to-Image Consistency via Automatic Prompt Optimization Paper • 2403.17804 • Published Mar 26, 2024 • 19
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Paper • 2306.02858 • Published Jun 5, 2023 • 20