Publications | Boyi Li

^* Equal contribution. ^♡ Project lead. ^✉ corresponding/co-corresponding author.
Also see my Google Scholar profile for the most up-to-date list.

2026

The 1st AI Children Challenge
Boyi Li, Yifan Shen, Houze Yang, Xu Cao, Guojun Yun, Li Gao, Turong Chen, Long Xu, Jianguo Cao, Meihuan Huang^✉

Paper

The 1st AI Children Challenge aims to advance real-world applications of computer vision and AI in child healthcare, child education, and pediatrics. The 2026 CV4CHL edition featured the first track in this domain: Children Gait Visual Analysis. The main goal of Children Gait Visual Analysis is the fine-grained analysis of children's gait behaviors from keypoint sequences.
CVPR 2026 Workshop on Computer Vision for Children

Toward Cognitive Supersensing in Multimodal Large Language Model
Boyi Li^*, Yifan Shen^*,♡, Yuanzhe Liu^*, Yifan Xu, Jiateng Liu, Xinzhuo Li, Zhengyuan Li, Jingyuan Zhu, Yunhan Zhong, Fangzhou Lan, Jianguo Cao, James M. Rehg, Heng Ji, Ismini Lourentzou^✉, Xu Cao^✉

Paper Code Website

A novel training paradigm that endows MLLMs with human-like visual imagery capabilities by integrating a Latent Visual Imagery Prediction (LVIP) head that jointly learns sequences of visual cognitive latent embeddings and aligns them with the answer, thereby forming vision-based internal reasoning chains. To evaluate the cognitive capabilities of MLLMs, we present CogSense-Bench, a comprehensive visual question answering (VQA) benchmark assessing five cognitive dimensions.
arXiv Preprint

Adaptive Graph Pruning for Multi-Agent Communication
Boyi Li^*, Zhonghan Zhao^*,♡, Der-Horng Lee^✉, Gaoang Wang^✉

Paper Code Website

A novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning), dynamically constructing optimized communication topologies tailored specifically to individual tasks.
ECAI 2025

See and think: Embodied agent in virtual environment
Zhonghan Zhao^*, Wenhao Chai^*,♡, Xuan Wang^*, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang^✉

Paper Code Website

A comprehensive and visionary embodied agent in the Minecraft virtual environment comprises three key components: vision perception, language instruction, and code action.
ECCV 2024