MLLM Tutorial@ACM-MM2024

About

MLLM Tutorial

Welcome to the MLLM Tutorial series on ACM MM 2024!

Artificial intelligence (AI) encompasses knowledge acquisition and real-world grounding across various modalities. As a multidisciplinary research field, multimodal large language models (MLLMs) have recently garnered growing interest in both academia and industry, showing an unprecedented trend to achieve human-level AI via MLLMs. These large models offer an effective vehicle for understanding, reasoning, and planning by integrating and modeling diverse information modalities, including language, visual, auditory, and sensory data. This tutorial aims to deliver a comprehensive review of cutting-edge research in MLLMs, focusing on following key areas: MLLM architecture, modality, functionality, instructional learning, multimodal hallucination, MLLM evaluation and multimodal reasoning of MLLMs. We will explore technical advancements, synthesize key challenges, and discuss potential avenues for future research.

🔔News

🔥[2024-11-03]: You can now visit the video record of the tutorial at Youtube!
🔥[2024-11-02]: We have released all the slides!
🔥[2024-10-22]: Also you may want to join our online Tutorial via this ~~Zoom link~~!
🔥[2024-10-20]: For in-person attendance, please come to Meeting Room 210, at Melbourne Convention and Exhibition Centre.
🔥[2024-10-10]: This tutorial will be held on Monday 28 October, 2024.

Organizer

Presenters

Hao Fei

National University of Singapore

Xiangtai Li

ByteDance/Tiktok

Haotian Liu

xAI

Fuxiao Liu

University of Maryland, College Park

Zhuosheng Zhang

Shanghai Jiao Tong University

Hanwang Zhang

Nanyang Technological University

Kaipeng Zhang

Shanghai AI Lab

Shuicheng Yan

Kunlun 2050 Research, Skywork AI

Schedule

PROGRAM

The tutorial will be held on Monday, 28 October, 2024 (all the times are based on UTC/GMT +11 = Melbourne VIC local time).

~~Also you can online join via Zoom Meeting~~

Time	Section	Presenter
09:00-09:05	Part 1: Background and Introduction [Slides]	Hao Fei
09:05-09:35	Part 2: MLLM Architecture&Modality [Slides]	Hao Fei
09:35-10:00	Part 3: MLLM Functionality&Advances [Slides]	Xiangtai Li
10:00-10:30	Part 4: MLLM Instruction Tuning [Slides]	Haotian Liu
	Coffee Break, Q&A Session
11:00-11:25	Part 5: MLLM Hallucination [Slides]	Fuxiao Liu
11:25-11:50	Part 6: MLLM Evaluation&Generalist [Slides]	Hanwang Zhang
11:50-12:10	Part 7: MM Reasoning [Slides]	Zhuosheng Zhang
12:10-12:30	Part 8: Panel Discussion - From MM Generalist to Human-level AI	All + Kaipeng Zhang + Shuicheng Yan

Tutorial Record

Video

Literature

Reading List

Citation

@inproceedings{fei2024multimodal,
title={From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning and Beyond},
author={Fei, Hao and Li, Xiangtai and Liu, Haotian and Liu, Fuxiao and Zhang, Zhuosheng and Zhang, Hanwang and Yan, Shuicheng},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={11289--11291},
year={2024}
}

MLLM Tutorial @ ACM MM 2024

From Multimodal LLM to Human-level AI:

Architecture, Modality, Function, Instruction, Hallucination, Evaluation, Reasoning and Beyond

28 October - 1 November 2024, Melbourne, Australia

About

🔔News

Organizer

Hao Fei

Xiangtai Li

Haotian Liu

Fuxiao Liu

Zhuosheng Zhang

Hanwang Zhang

Kaipeng Zhang

Shuicheng Yan

Schedule

Tutorial Record

Literature

Architecture and Modality of LLMs and MLLMs

Functionality and Recent Advances in MLLMs

Instruction Tuning & Hallucination

MLLM Evaluation and Benchmarks

Multimodal Reasoning and Agent

Citation

Contact

Join and post at our Google Group!

Email the organziers at mllm24@googlegroups.com .