Some Random Stuff 👋

I am Mingyuan Wu, a part-time researcher at Meta. I am currently a final-year PhD candidate in Computer Science at University of Illinois, Urbana Champaign. I am fortunate to be advised by Prof. Klara Nahrstedt, and and to work with Prof. Minjia Zhang and Prof. Chengxiang Zhai. Before starting my PhD, I spent wonderful time at UIUC and Shanghai Jiao Tong University.

I work on vision language model agents and multimodal reasoning. In 2025, I have been cooking VLMs and LLMs with recipes of multi-turn reinforcement learning fine-tuning and inference-time scaling, to boost self-improvement, reasoning, and memory use. I also investigate VLM interpretability through circuit tracing.

My research goal is to build human-level agents that partner with people: humans set the intent with prompts; agents reason, execute and suggest new possibilities (via recommendations).

When I’m not teaching large models, I prototype augmented-reality systems for fun, hoping that, one day, agentic models will power these interactive, human-centered interfaces.

Industry Experience

/
  • MRS AI, Meta
    May 2025 – Dec 2025 • Research Intern, with Shengyi Qian, Xudong Wang, Hanchao Yu
    Bootstrapping Multimodal Reasoning with Self-verification in RL
  • Video Rec, Meta
    May 2024 – Dec 2024 • Research Intern, with Meng Wu, Daming Li, Arthur Zhang
    VLM/LLM for Short Video Recommendations and Retrieval
  • LLM Research Team, Capital Today
    May 2023 – Aug 2023 • Research Intern, with partners
    Distinct from academic research work, but with researchers from Tsinghua, ETH and Princeton
  • Xin's Group, Adobe
    May 2022 – Aug 2022 • Research Intern, with Zichuan Liu, Xin Lu (both at Bytedance now)
    Efficient Interactive Segmentation and Matting for Mobile
  • IOTG, Intel
    May 2017 – Dec 2017 • Machine Vision Software Intern, with Wenjie Wang
    On-Camera Object Detection

Research

(* denotes equal contribution or main contribution)
/
  • Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?
    Mingyuan Wu*, Meitang Li*, Jingcheng Yang*, Jize Jiang, Kaizhuo Yan, Zhaoheng Li, Hanchao Yu, Minjia Zhang, Klara Nahrstedt
    NeurIPS 2025 Multimodal Algorithmic Reasoning Workshop, Oral Presentation.
  • VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
    Mingyuan Wu*, Jingcheng Yang*, Jize Jiang, Meitang Li, Kaizhuo Yan, Zhaoheng Li, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt
    Preprint.
  • Cache-of-Thought: Master-Apprentice Framework for Cost-Effective Vision Language Model Reasoning
    Mingyuan Wu*, Jize Jiang*, Haozhen Zheng*, Meitang Li, Zhaoheng Li, Beitong Tian, Bo Chen, Yongjoo Park, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt
    EMNLP 2025 Main.
  • Spatio-Temporal LLM: Reasoning about Environments and Actions
    Haozhen Zheng, Beitong Tian, Mingyuan Wu, Zhenggang Tang, Klara Nahrstedt, Alex Schwing
    Preprint.
  • RecoWorld: Building Simulated Environments for Agentic Recommender Systems
    MRS AI Teams at Meta
    Preprint.
  • UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
    Xinyu Pi*, Mingyuan Wu*, Jize Jiang*, Haozhen Zheng*, Beitong Tian, Chengxiang Zhai, Klara Nahrstedt, Zhiting Hu
    EMNLP 2024 Main.
  • TraceNet: Segment One Thing Efficiently
    Mingyuan Wu, Zichuan Liu, Hongpeng Guo, Bo Chen, Xin Lu, Klara Nahrstedt
    IEEE MIPR 2025 — 🏆 Best Student Paper Award.
  • Anywhere Avatar: 3D Telepresence with Just a Phone and a Laptop
    Ruifan Ji, Mingyuan Wu, Bo Chen, Michael Zink, Ramesh Sitaraman, Jacob Chakareski, Klara Nahrstedt
    ACM Multimedia 2025, Demo Track.
  • Scene Graph Driven Hybrid Interactive VR Teleconferencing
    Mingyuan Wu*, Ruifan Ji*, Haozhen Zheng, Jiaxi Li, Beitong Tian, Bo Chen, Ruixiao Zhang, Jacob Chakareski, Michael Zink, Ramesh Sitaraman, Klara Nahrstedt
    ACM Multimedia 2024, Demo Track.
  • miVirtualSeat: A Next Generation Hybrid Telepresence System
    Klara Nahrstedt, Ramesh Sitaraman, Jacob Chakareski, Michael Zink, Mingyuan Wu, Lingdong Wang, Bo Chen, Ruifan Ji, Kuny Lee, John Murray, Simran Singh, et al.
    ACM SIGCOMM 2025 EMS workshop Oral Presentation.
  • Seaware: Semantic-aware View Prediction System for 360-degree Video Streaming
    Jounsuk Park, Mingyuan Wu, Kuan-Ying Lee, Bo Chen, Klara Nahrstedt, Michael Zink, Ramesh Sitaraman
    IEEE ISM 2020 — 🏆 Best Paper Award.
  • AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models
    Beitong Tian*, Lingzhi Zhao*, Bo Chen, Haozhen Zheng, Jingcheng Yang, Mingyuan Wu, Deepak Vasishth, Klara Nahrstedt
    Preprint.
  • ImmerScope: Multi-view Video Aggregation at Edge towards Immersive Content Services
    Bo Chen, Hongpeng Guo, Mingyuan Wu, Zhe Yang, Zhisheng Yan, Klara Nahrstedt
    ACM SenSys 2024.
  • GaugeTracker: AI-Powered Cost-Effective Analog Gauge Monitoring System
    Beitong Tian, Mingyuan Wu, Ruixiao Zhang, Haozhen Zheng, Bo Chen, Yaohui Wang, Shiv Trivedi, Shanbo Zhang, et al., Klara Nahrstedt
    IEEE MIPR 2024.
  • Vesper: Learning to Manage Uncertainty in Video Streaming
    Bo Chen, Mingyuan Wu, Hongpeng Guo, Zhisheng Yan, Klara Nahrstedt
    ACM MMSys 2024.
  • I-Matting: Improved Trimap-Free Image Matting
    Zichuan Liu*, Ke Wang*, Mingyuan Wu*, Lantao Yu, Klara Nahrstedt, Xin Lu
    IEEE ICME 2024.
  • Interactive Scene Graph Analysis for Future Intelligent Teleconferencing Systems
    Mingyuan Wu, Yuhan Lu, Shiv Trivedi, Bo Chen, Qian Zhou, Lingdong Wang, Simran Singh, Michael Zink, Ramesh Sitaraman, Jacob Chakareski, Klara Nahrstedt
    IEEE ISM 2023.
  • 360TripleView: 360-Degree Video View Management System Driven by Convergence Value of Viewing Preferences
    Qian Zhou, Mingyuan Wu, Yinjie Zhang, Michael Zink, Ramesh Sitaraman, Klara Nahrstedt
    IEEE ISM 2023.
  • SAVG360: Saliency-aware Viewport-guidance-enabled 360-video Streaming System
    Yinjie Zhang, Mingyuan Wu, Beitong Tian, Jiaxi Li, Bo Chen, Qian Zhou, Klara Nahrstedt
    IEEE ISM 2023.
  • Video 360 Content Navigation for Mobile HMD Devices
    Jounsuk Park, Mingyuan Wu, Eric Lee, Klara Nahrstedt, Yash Shah, Arielle Rosenthal, John Murray, Kevin Spiteri, Michael Zink, Ramesh Sitaraman
    ACM Multimedia 2021.
  • AnyLoc: Energy-efficient Visual Localization in Dynamic and Large-Scale Scenes without Pain
    Beitong Tian, Jingcheng Yang, Jionghao Wei, Zhenggang Tang, Yuheng Wang, Hongwen Xiao, Haozhen Zheng, Mingyuan Wu, Shiv Trivedi, Klara Nahrstedt
    Preprint.

Events

Earlier events…

Acknowledgements

This simple website is built from Jiayi Pan's website and Codex.