SMMU: Benchmarking Social Intelligence of Multimodal Large Language Models

Under Review

TL;DR

SMMU is a video benchmark for evaluating social intelligence in multimodal large language models. It tests whether models can infer relationships, intent, emotion, perspective, and knowledge state from timestamped real-world video moments, then answer comprehension, reasoning, and prediction questions.

Social Dimension Examples