Mingyuan Wu Personal Website

Some Random Stuff 👋

I am Mingyuan Wu, a part-time researcher at Meta. I am currently a final-year PhD candidate in Computer Science at University of Illinois, Urbana Champaign. I am fortunate to be advised by Prof. Klara Nahrstedt, and and to work with Prof. Minjia Zhang and Prof. Chengxiang Zhai. Before starting my PhD, I spent wonderful time at UIUC and Shanghai Jiao Tong University.

I work on vision language model agents and multimodal reasoning. In 2025, I have been cooking VLMs and LLMs with recipes of multi-turn reinforcement learning fine-tuning and inference-time scaling, to boost self-improvement, reasoning, and memory use. I also investigate VLM interpretability through circuit tracing.

My research goal is to build human-level agents that partner with people: humans set the intent with prompts; agents reason, execute and suggest new possibilities (via recommendations).

When I’m not teaching large models, I prototype augmented-reality systems for fun, hoping that, one day, agentic models will power these interactive, human-centered interfaces.

Industry Experience

MRS AI, Meta

May 2025 – Dec 2025 • Research Intern, with Shengyi Qian, Xudong Wang, Hanchao Yu

Bootstrapping Multimodal Reasoning with Self-verification in RL
Video Rec, Meta

May 2024 – Dec 2024 • Research Intern, with Meng Wu, Daming Li, Arthur Zhang

VLM/LLM for Short Video Recommendations and Retrieval
LLM Research Team, Capital Today

May 2023 – Aug 2023 • Research Intern, with partners

Distinct from academic research work, but with researchers from Tsinghua, ETH and Princeton
Xin's Group, Adobe

May 2022 – Aug 2022 • Research Intern, with Zichuan Liu, Xin Lu (both at Bytedance now)

Efficient Interactive Segmentation and Matting for Mobile
IOTG, Intel

May 2017 – Dec 2017 • Machine Vision Software Intern, with Wenjie Wang

On-Camera Object Detection

Research

(* denotes equal contribution or main contribution)

Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?

Mingyuan Wu*, Meitang Li*, Jingcheng Yang*, Jize Jiang, Kaizhuo Yan, Zhaoheng Li, Hanchao Yu, Minjia Zhang, Klara Nahrstedt

NeurIPS 2025 Multimodal Algorithmic Reasoning Workshop, Oral Presentation.

Paper
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use

Mingyuan Wu*, Jingcheng Yang*, Jize Jiang, Meitang Li, Kaizhuo Yan, Zhaoheng Li, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt

Preprint.

Paper Code
Cache-of-Thought: Master-Apprentice Framework for Cost-Effective Vision Language Model Reasoning

Mingyuan Wu*, Jize Jiang*, Haozhen Zheng*, Meitang Li, Zhaoheng Li, Beitong Tian, Bo Chen, Yongjoo Park, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt

EMNLP 2025 Main.

Paper Code
Spatio-Temporal LLM: Reasoning about Environments and Actions

Haozhen Zheng, Beitong Tian, Mingyuan Wu, Zhenggang Tang, Klara Nahrstedt, Alex Schwing

Preprint.

Paper Code
RecoWorld: Building Simulated Environments for Agentic Recommender Systems

MRS AI Teams at Meta

Preprint.

Paper
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models

Xinyu Pi*, Mingyuan Wu*, Jize Jiang*, Haozhen Zheng*, Beitong Tian, Chengxiang Zhai, Klara Nahrstedt, Zhiting Hu

EMNLP 2024 Main.

Paper Code
TraceNet: Segment One Thing Efficiently

Mingyuan Wu, Zichuan Liu, Hongpeng Guo, Bo Chen, Xin Lu, Klara Nahrstedt

IEEE MIPR 2025 — 🏆 Best Student Paper Award.
Anywhere Avatar: 3D Telepresence with Just a Phone and a Laptop

Ruifan Ji, Mingyuan Wu, Bo Chen, Michael Zink, Ramesh Sitaraman, Jacob Chakareski, Klara Nahrstedt

ACM Multimedia 2025, Demo Track.
Scene Graph Driven Hybrid Interactive VR Teleconferencing

Mingyuan Wu*, Ruifan Ji*, Haozhen Zheng, Jiaxi Li, Beitong Tian, Bo Chen, Ruixiao Zhang, Jacob Chakareski, Michael Zink, Ramesh Sitaraman, Klara Nahrstedt

ACM Multimedia 2024, Demo Track.
miVirtualSeat: A Next Generation Hybrid Telepresence System

Klara Nahrstedt, Ramesh Sitaraman, Jacob Chakareski, Michael Zink, Mingyuan Wu, Lingdong Wang, Bo Chen, Ruifan Ji, Kuny Lee, John Murray, Simran Singh, et al.

ACM SIGCOMM 2025 EMS workshop Oral Presentation.
Seaware: Semantic-aware View Prediction System for 360-degree Video Streaming

Jounsuk Park, Mingyuan Wu, Kuan-Ying Lee, Bo Chen, Klara Nahrstedt, Michael Zink, Ramesh Sitaraman

IEEE ISM 2020 — 🏆 Best Paper Award.
AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models

Beitong Tian*, Lingzhi Zhao*, Bo Chen, Haozhen Zheng, Jingcheng Yang, Mingyuan Wu, Deepak Vasishth, Klara Nahrstedt

Preprint.
ImmerScope: Multi-view Video Aggregation at Edge towards Immersive Content Services

Bo Chen, Hongpeng Guo, Mingyuan Wu, Zhe Yang, Zhisheng Yan, Klara Nahrstedt

ACM SenSys 2024.
GaugeTracker: AI-Powered Cost-Effective Analog Gauge Monitoring System

Beitong Tian, Mingyuan Wu, Ruixiao Zhang, Haozhen Zheng, Bo Chen, Yaohui Wang, Shiv Trivedi, Shanbo Zhang, et al., Klara Nahrstedt

IEEE MIPR 2024.
Vesper: Learning to Manage Uncertainty in Video Streaming

Bo Chen, Mingyuan Wu, Hongpeng Guo, Zhisheng Yan, Klara Nahrstedt

ACM MMSys 2024.
I-Matting: Improved Trimap-Free Image Matting

Zichuan Liu*, Ke Wang*, Mingyuan Wu*, Lantao Yu, Klara Nahrstedt, Xin Lu

IEEE ICME 2024.
Interactive Scene Graph Analysis for Future Intelligent Teleconferencing Systems

Mingyuan Wu, Yuhan Lu, Shiv Trivedi, Bo Chen, Qian Zhou, Lingdong Wang, Simran Singh, Michael Zink, Ramesh Sitaraman, Jacob Chakareski, Klara Nahrstedt

IEEE ISM 2023.
360TripleView: 360-Degree Video View Management System Driven by Convergence Value of Viewing Preferences

Qian Zhou, Mingyuan Wu, Yinjie Zhang, Michael Zink, Ramesh Sitaraman, Klara Nahrstedt

IEEE ISM 2023.
SAVG360: Saliency-aware Viewport-guidance-enabled 360-video Streaming System

Yinjie Zhang, Mingyuan Wu, Beitong Tian, Jiaxi Li, Bo Chen, Qian Zhou, Klara Nahrstedt

IEEE ISM 2023.
Video 360 Content Navigation for Mobile HMD Devices

Jounsuk Park, Mingyuan Wu, Eric Lee, Klara Nahrstedt, Yash Shah, Arielle Rosenthal, John Murray, Kevin Spiteri, Michael Zink, Ramesh Sitaraman

ACM Multimedia 2021.
AnyLoc: Energy-efficient Visual Localization in Dynamic and Large-Scale Scenes without Pain

Beitong Tian, Jingcheng Yang, Jionghao Wei, Zhenggang Tang, Yuheng Wang, Hongwen Xiao, Haozhen Zheng, Mingyuan Wu, Shiv Trivedi, Klara Nahrstedt

Preprint.

Events

Date — Invited Talk at TBD Institute.

Earlier events…

Acknowledgements

This simple website is built from Jiayi Pan's website and Codex.