* Equal contribution. ♡ Project lead. ✉ corresponding/co-corresponding author.
Also see my Google Scholar profile for the most up-to-date list.
2026
-
The 1st AI Children Challenge
Boyi Li, Yifan Shen, Houze Yang, Xu Cao, Guojun Yun, Li Gao, Turong Chen, Long Xu, Jianguo Cao, Meihuan Huang✉
The 1st AI Children Challenge aims to advance real-world applications of computer vision and AI in child healthcare, child education, and pediatrics. The 2026 CV4CHL edition featured the first track in this domain: Children Gait Visual Analysis. The main goal of Children Gait Visual Analysis is the fine-grained analysis of children's gait behaviors from keypoint sequences.CVPR 2026 Workshop on Computer Vision for Children -
Toward Cognitive Supersensing in Multimodal Large Language Model
Boyi Li*, Yifan Shen*,♡, Yuanzhe Liu*, Yifan Xu, Jiateng Liu, Xinzhuo Li, Zhengyuan Li, Jingyuan Zhu, Yunhan Zhong, Fangzhou Lan, Jianguo Cao, James M. Rehg, Heng Ji, Ismini Lourentzou✉, Xu Cao✉
A novel training paradigm that endows MLLMs with human-like visual imagery capabilities by integrating a Latent Visual Imagery Prediction (LVIP) head that jointly learns sequences of visual cognitive latent embeddings and aligns them with the answer, thereby forming vision-based internal reasoning chains. To evaluate the cognitive capabilities of MLLMs, we present CogSense-Bench, a comprehensive visual question answering (VQA) benchmark assessing five cognitive dimensions.arXiv Preprint
2025
-
Adaptive Graph Pruning for Multi-Agent Communication
Boyi Li*, Zhonghan Zhao*,♡, Der-Horng Lee✉, Gaoang Wang✉
A novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning), dynamically constructing optimized communication topologies tailored specifically to individual tasks.ECAI 2025
2024
-
See and think: Embodied agent in virtual environment
Zhonghan Zhao*, Wenhao Chai*,♡, Xuan Wang*, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang✉
A comprehensive and visionary embodied agent in the Minecraft virtual environment comprises three key components: vision perception, language instruction, and code action.ECCV 2024