🗨 About Me
Hi, I'm Boyi Li. I am pursuing my B.S. degree in Computer Engineering at the ZJU-UIUC Institute.I'm now a research intern in the Rehg Lab in UIUC. Previously, I was very fortunate to be a research assistant in CVNext Lab, advised by Prof. Gaoang Wang.
Research Interests:
- Embodied AI
- Multimodal LLM
- Multi-Agent Systems
- Reinforcement Learning
- Healthcare CV
🌎 Service
- Reviewer for Conference: KDD (AI4Science Track), ICLR, CVPR
- Workshop Organizer:
📝 Selected Publications:
* Equal contribution. ♡ Project lead. ✉ corresponding/co-corresponding author.
Also see Publications Page and Google Scholar.
-
See and think: Embodied agent in virtual environment
Zhonghan Zhao*, Wenhao Chai*,♡, Xuan Wang*, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang✉
[Paper] [Code] [Website]
A comprehensive and visionary embodied agent in the Minecraft virtual environment comprises three key components: vision perception, language instruction, and code action.
ECCV 2024 -
Adaptive Graph Pruning for Multi-Agent Communication
Boyi Li*, Zhonghan Zhao*,♡, Der-Horng Lee✉, Gaoang Wang✉
[Paper] [Code] [Website]
A novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning), dynamically constructing optimized communication topologies tailored specifically to individual tasks.
ECAI 2025 -
Toward Cognitive Supersensing in Multimodal Large Language Model
Boyi Li*, Yifan Shen*,♡, Yuanzhe Liu*, Yifan Xu, Jiateng Liu, Xinzhuo Li, Zhengyuan Li, Jingyuan Zhu, Yunhan Zhong, Fangzhou Lan, Jianguo Cao, James M. Rehg, Heng Ji, Ismini Lourentzou✉, Xu Cao✉
[Paper] [Code] [Website]
A novel training paradigm that endows MLLMs with human-like visual imagery capabilities by integrating a Latent Visual Imagery Prediction (LVIP) head that jointly learns sequences of visual cognitive latent embeddings and aligns them with the answer, thereby forming vision-based internal reasoning chains. To evaluate the cognitive capabilities of MLLMs, we present CogSense-Bench, a comprehensive visual question answering (VQA) benchmark assessing five cognitive dimensions.
Preprint
News:
- Jul. 2024: Our paper See and think: Embodied agent in virtual environment is accepted by ECCV 2024.
- Jul. 2025: Our paper Adaptive Graph Pruning for Multi-Agent Communication is accepted by ECAI 2025.
- Jan. 2026: It is a great honor to serve as one of the organizers of CVPR 2026 Workshop on Computer Vision for Children (CV4CHL).