MS in CS of MMath at University of Waterloo
Supervisor: Prof. Yuntian Deng

I wanna make something general.


My research interests lie at the intersection of Natural Language Processing, Computer Vision, and Machine, Learning. Specifically, my focus is on improving the reasoning abilities of large generative models, such as large language models and vision language models. My long-term objective is to allow these models to reason and generate new knowledge from their observations and interactions with users.

Currently, I am working on multi-agents and large (vision) language models. Note that these foci are temporary since the field is changing quickly.


Everybody Rap Now: Coherent Vocals and Whole-body Motions Generations from Text

Under-review, 2024
Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan
[project] [paper] [code] [dataset] [BibTeX]

3D-VLA: A 3D Vision-Language-Action Generative World Model

ICML, 2024
Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan
[project] [paper] [code] [twitter] [BibTeX]

ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

ICML, 2024
Zhicheng Zheng*, Xin Yan*, Zhenfang Chen*, Jingzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, Chuang Gan
[project] [paper] [code] [dataset] [BibTeX]

Centroid-centered Modeling for Efficient Vision Transformer Pre-training

Preprint, 2023
Xin Yan, Zuchao Li, Lefei Zhang, Bo Du, Dacheng Tao
[paper] [code] [BibTeX]


MIT-IBM Watson AI Lab (Mar 23 - Now)
Supervisors: Prof. Chuang Gan & Dr. Zhenfang Chen
Research: Reasoning in Vision and Language

Wuhan University (Mar 22 - Mar 23)
Supervisors: Prof. Lefei Zhang & Prof. Zuchao Li
Research: Vision and Language



My life’s creed is to prioritize experiential living. While I rejoice in positive outcomes, I believe that the experience of completing a job well done is its own reward.