ICML 2026 Oral · FMs for Science, ICLR 2026 Workshop
CausalGame: Benchmarking Causal Thinking of LLM Agents in Games
Zhenhao Chen*, Yongqiang Chen*, Chenxi Liu*, Junchi Yu, Xiangchen Song, Zijian Li, Jialin Li, Philip Torr, Bo Han, Kun Zhang
An interactive benchmark for evaluating how LLM agents design experiments, reason from biased evidence, and recover hidden mechanisms.