MLLM Tutorial

About

MLLM Tutorial

Welcome to the MLLM Tutorial series on CVPR 2024!

Artificial intelligence (AI) encompasses knowledge acquisition and real-world grounding across various modalities. As a multidisciplinary research field, multimodal large language models (MLLMs) have recently garnered growing interest in both academia and industry, showing an unprecedented trend to achieve human-level AI via MLLMs. These large models offer an effective vehicle for understanding, reasoning, and planning by integrating and modeling diverse information modalities, including language, visual, auditory, and sensory data. This tutorial aims to deliver a comprehensive review of cutting-edge research in MLLMs, focusing on four key areas: MLLM architecture design, instructional learning&hallucination, multimodal reasoning of MLLMs and efficient learning in MLLMs. We will explore technical advancements, synthesize key challenges, and discuss potential avenues for future research.

Seattle local time zone (UTC-7): Tuesday, June 18, 1:30 PM-6:00 PM

Beijing time zone (UTC+8): Wednesday, June 19, 4:30 AM - 9:00 AM

🔔News

🔥[2024-06-19]: You can now visit the video record of the tutorial at Youtube!
🔥[2024-06-19]: We have released all the slides!
🔥[2024-06-18]: Our tutorial is about to start, at room Summit 446 for in-person attendance!
🔥[2024-06-18]: Also you may want to join our online Tutorial via this ~~Zoom link~~!

Organizer

Presenters

Hao Fei

National University of Singapore

Yuan Yao

National University of Singapore

Ao Zhang

National University of Singapore

Haotian Liu

University of Wisconsin-Madison

Fuxiao Liu

University of Maryland, College Park

Zhuosheng Zhang

Shanghai Jiao Tong University

Hanwang Zhang

Nanyang Technological University

Shuicheng Yan

Kunlun 2050 Research, Skywork AI

Schedule

PROGRAM

Our tutorial will be held on Tuesday, June 18, 2024 (all the times are based on UTC-7 = Seattle local time).

Time	Section	Presenter
13:30-13:35	Part 1: Background and Introduction [Slides]	Hao Fei
13:35-14:05	Part 2: MLLM Architecture [Slides]	Yuan Yao
14:05-14:35	Part 3: MLLM Modality&Functionality [Slides]	Hao Fei
14:35-15:05	Part 4: MLLM Instruction Tuning [Slides]	Haotian Liu
	Coffee Break, Q&A Session
16:00-16:30	Part 5: MLLM Hallucination [Slides]	Fuxiao Liu
16:30-17:00	Part 6: MM Reasoning [Slides]	Zhuosheng Zhang
17:00-17:30	Part 7: MLLM Efficiency [Slides]	Ao Zhang
17:30-18:00	Part 8: Panel Discussion - From MM Generalist to Human-level AI	All + Hanwang Zhang + Shuicheng Yan

Tutorial Record

Video

Literature

Reading List

Citation

@inproceedings{fei2024multimodal,
title={From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning and Beyond},
author={Fei, Hao and Li, Xiangtai and Liu, Haotian and Liu, Fuxiao and Zhang, Zhuosheng and Zhang, Hanwang and Yan, Shuicheng},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={11289--11291},
year={2024}
}

MLLM Tutorial @ CVPR 2024

From Multimodal LLM to Human-level AI:

Modality, Instruction, Reasoning, Efficiency and Beyond

About

🔔News

Organizer

Hao Fei

Yuan Yao

Ao Zhang

Haotian Liu

Fuxiao Liu

Zhuosheng Zhang

Hanwang Zhang

Shuicheng Yan

Schedule

Tutorial Record

Literature

Section I: LLMs and MLLMs

Section II: Instruction Tuning & Hallucination

Section III: Reasoning with LLM

Section IV: Efficient Learning

Citation

Contact

Join and post at our Google Group!

Email the organziers at mllm24@googlegroups.com .