Yifan Wu

Hi there! I am currently an AI Research Scientist at Meta, working on LLM post-training (Llama 3 & 4, Meta AI). Recently, I'm interested in LLM coding agents and automated research agents — harness design and advancing reinforcement learning for long-horizon, multi-turn agentic tasks.

I did my PhD at the University of Pennsylvania, working at the intersection of AI, computer vision, and healthcare, where I was affiliated with GRASP Lab and Penn Image Computing and Science Laboratory. I was grateful to be advised by Prof. James C. Gee, and to work closely with Prof. Jianbo Shi and Prof. Mark Yatskar.

Email: yifannn.wu [AT] gmail.com / Google Scholar / Linkedin / Github / X

Selected Publications/Blog

Selected publications listed newest first, the full list lives on Google Scholar. Some works are highlighted.

	SWE-Together: Evaluating Coding Agents in Interactive User Sessions LLMAgent Yifan Wu, Zhuokai Zhao, Songlin Li, Ho Hin Lee, Jiacheng Zhu, Shirley Wu, Tianhe Yu, Serena Li, Lizhu Zhang, Xiangjun Fan, Shengzhi Li Arxiv, 2026. Paper / Code / Website / Post We introduce SWE-Together, a multi-turn coding benchmark of 109 repository-level tasks from real user–agent sessions, replayed by a reactive LLM user simulator and scored on final repository correctness and the number of interventions required.
	Remember When It Matters: Proactive Memory Agent for Long-Horizon Agents LLMAgent Yifan Wu, Lizhu Zhang, Yuhang Zhou, Mingyi Wang, Bo Peng, Serena Li, Xiangjun Fan, Zhuokai Zhao Arxiv, 2026. Paper / Code / Post We propose a proactive memory agent that runs alongside the action agent, maintaining a structured memory and selectively surfacing reminders only when beneficial, improving Terminal-Bench by +8.3 pp and τ²-Bench by +6.8 pp over passive memory-bank exposure.
	SAGE-OPD: Selective Agent-Guided Intervention for Multi-Turn On-Policy Distillation LLMAgent Yuhang Zhou, Lizhu Zhang, Yifan Wu, Mingyi Wang, Bo Peng, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao Arxiv, 2026. Paper We propose SAGE-OPD, a verifier-free selective intervention framework for multi-turn on-policy distillation that uses teacher judgment to decide which student turns to skip or intervene on, weights distillation by teacher confidence, and normalizes the loss.
	OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification LLM Yuhang Zhou, Lizhu Zhang, Yifan Wu, Mingyi Wang, Peng Bo, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao Arxiv, 2026. Paper We introduce OmniOPD, by replacing token-logit matching with chunk-level semantic verification, we enable dense on-policy distillation from black-box teachers, with stabilizers that help prevent collapse.
	Synthetic Sandbox for Training Machine Learning Engineering Agents LLMAgent Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao†, Hong Yan† COLM, 2026. Paper We introduce SandMLE, a multi-agent framework that generates diverse, verifiable synthetic MLE environments at micro-scale (50–200 training samples per task), cutting training time of one round evolution by 13× and enabling large-scale on-policy trajectory-wise RL.
	Scaling Agent Learning via Experience Synthesis LLMAgent Zhaorun Chen, Zhuokai Zhao, Kai Zhang, Bo Liu, Qi Qi, Yifan Wu, Tarun Kalluri, Sara Cao, Yuanhao Xiong, Haibo Tong, Huaxiu Yao, Hengduo Li, Jiacheng Zhu, Xian Li, Dawn Song, Bo Li, Jason Weston†, Dat Huynh† ICLR, 2026. Paper We introduce DreamGym, a unified framework that synthesizes diverse agent experiences via a reasoning-based experience model, enabling scalable online RL without costly real-environment rollouts and providing a strong warm-start for sim-to-real transfer.
	Agent Learning via Early Experience LLMAgent Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su†, Yifan Wu† ICML, 2026. Paper We introduce "early experience", a paradigm in which agents learn from interactions generated by their own actions, using the resulting future states as supervision in place of reward signals — bridging imitation learning and reinforcement learning.
	The Llama 4 Herd: The Beginning of a New Era of Natively Multimodal AI Innovation LLM Llama team Meta AI Blog, 2025. Blog / Paper
	A Concept-based Interpretable Model for the Diagnosis of Choroid Neoplasias using Multimodal Data Medical AILLMMultimodal Yifan Wu, Yang Liu, Yue Yang, Michael S. Yao, Wenli Yang, Xuehui Shi, Lihong Yang, Dongjun Li, Yueming Liu, James C. Gee, Xuan Yang, Wenbin Wei, Shi Gu Nature Communications, 2025. Paper / Demo We demonstrated how to encode the expertise of specialized clinicians into AI to build an interpretable machine learning model that produces outputs understandable by humans.
	A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis Medical AILLMMultimodal Yue Yang, Mona Gandhi, Yufei Wang, Yifan Wu, Michael S. Yao, Chris Callison-Burch, James C. Gee, Mark Yatskar NeurIPS 2024 (Spotlight). Paper / Website We introduced KnoBo, incorporating medical knowledge priors into interpretable models to enhance robustness against distribution shifts in hospitals, demographics, sex, and race, etc.
	The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task LLMMultimodal Yifan Wu, Pengchuan Zhang, Wenhan Xiong, Barlas Oguz, James C. Gee, Yixin Nie Arxiv, 2023 Paper We found that GPT-4V can benefit significantly from the Chain-of-Thought prompt. We present the "Description then Decision" strategy, which improves Winoground task performance by 50%.
	Towards Establishing Dense Correspondence on Multiview Coronary Angiography: From Point-to-Point to Curve-to-Curve Query Matching Medical AIComputer Vision Yifan Wu, Rohit Jena, Mehmet Gulsun, Vivek Singh, Puneet Sharma, James C. Gee Arxiv, 2023, under review. Paper We established dense correspondence in multi-view angiography by formulating it as a query matching problem and extending point matching to curve matching for enhanced topological awareness.
	NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration Medical AIComputer Vision Yifan Wu, Tom Z Jiahao, Jiancong Wang, Paul A Yushkevich, M Ani Hsieh, James C. Gee CVPR, 2022 Project Page/ Paper/ Supplementary / Sequential Registration Extension Work We model each voxel as a moving particle and consider the set of all voxels in a 3D image as a high-dimensional dynamical system whose trajectory determines the targeted deformation field.
	Interpretable Identification of Interstitial Lung Disease (ILD) Associated Findings from CT Medical AIComputer Vision Yifan Wu, Jiancong Wang, William D. Lindsay, Tarmily Wen, Jianbo Shi, and James C. Gee MICCAI, 2020 Paper Formulated the radiologic ILD findings identification as a multi-class classification problem given the raw thoracic CT dataset.
	From Image to Video Face Inpainting: Spatial-Temporal Nested GAN (STN-GAN) for Usability Recovery Computer Vision Yifan Wu, Vivek Singh, Ankur Kapoor WACV, 2020 Paper/ Video Result We propose to use constrained inpainting methods to recover usability of corrupted images, which are masked for privacy protection but complete images are required for further algorithm development.
	Towards Generating Personalized Volumetric Phantom from Patient's Surface Geometry Medical AIComputer Vision Yifan Wu, Vivek Singh, Brian Teixeira, Kai Ma, Birgi Tamersoy, Andreas Krauss, and Terrence Chen MICCAI, 2019 Paper This paper presents a method to generate a volumetric phantom with internal anatomical structures from the patient?s skin surface geometry.
	Privacy-Protective-GAN for Face De-identification Computer Vision Yifan Wu, Fan Yang, and Haibin Ling Arxiv, 2018 Paper Defined the face-identification task by establishing an effective de-identification measurement: achieve privacy protection simultaneously preserving data utility. Proposed an end-to-end trainable framework to synthesize de-identified facial images.

Experiences

	Research Scientist, GenAI / Meta Superintelligence Labs, 2024 - present, New York & Menlo Park
	Research Scientist Intern, GenAI, summer 2023, Menlo Park, CA

	Research Scientist, Computer Vision Team, 2018, Princeton, NJ
	Research Scientist Intern, Computer Vision Team, summer 2017, Princeton, NJ

Teaching

	Fall 2022: CIS537 Biomedical Image Analysis, Teaching Assistant.
	Fall 2021: CIS581 Computer Vision and Computational Photography, Teaching Assistant.

Website template from Jon Barron.