TL;DR
SMMU is a video benchmark for evaluating social intelligence in multimodal large language models. It tests whether models can infer relationships, intent, emotion, perspective, and knowledge state from timestamped real-world video moments, then answer comprehension, reasoning, and prediction questions.
Social Dimension Examples
Use the carousel to move across the five SMMU social dimensions. Each slide pairs one video with timestamped comprehension, reasoning, and prediction questions.