11 13 7

le.zhang

le723z

Magiccircuit

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning

new activity 9 days ago

le723z/RiT:Improve model card: add paper link and sample usage

upvoted a paper 10 days ago

RiT: Vanilla Diffusion Transformers Suffice in Representation Space

View all activity

Organizations

None yet

upvoted a paper 3 days ago

How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning

Paper • 2605.27310 • Published 6 days ago • 18

upvoted a paper 10 days ago

RiT: Vanilla Diffusion Transformers Suffice in Representation Space

Paper • 2605.21981 • Published 11 days ago • 10

upvoted 2 papers 3 months ago

Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion

Paper • 2603.14645 • Published Mar 15 • 5

EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

Paper • 2603.13594 • Published Mar 13 • 149

upvoted a paper 8 months ago

It Takes Two: Your GRPO Is Secretly DPO

Paper • 2510.00977 • Published Oct 1, 2025 • 32

upvoted 4 papers about 1 year ago

Learning to Reason without External Rewards

Paper • 2505.19590 • Published May 26, 2025 • 31

Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models

Paper • 2505.20152 • Published May 26, 2025 • 11

Discrete Markov Bridge

Paper • 2505.19752 • Published May 26, 2025 • 16

REARANK: Reasoning Re-ranking Agent via Reinforcement Learning

Paper • 2505.20046 • Published May 26, 2025 • 18

upvoted a collection about 1 year ago

Qwen3

Collection

84 items • Updated Dec 31, 2025 • 1.8k

upvoted an article almost 2 years ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

danaaubakirova, andito

•

Jul 25, 2024

• 17

upvoted a paper about 2 years ago

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Paper • 2403.17804 • Published Mar 26, 2024 • 19

upvoted a paper over 2 years ago

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Paper • 2306.02858 • Published Jun 5, 2023 • 20

le.zhang

AI & ML interests

Recent Activity

Organizations

le723z's activity

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?