For the second paragraph, we’re assuming this AI has not made a mistake in predicting human behavior yet after many, many trials in different scenarios. No exact probability. We’re also assuming perfect levels of observation, so we know that they pressed a button, bombs are heading over, and any observable context behind the decision (like false information).
The first paragraph contains an idea I hadn’t considered, and it might be central to the whole thing. I’ll ponder it more.
Suppose we don’t have any prior information about the dataset, only our observations. Is any metric more accurate than assuming our dataset is the exact distribution and calculating mutual information? Kind of like bootstrapping.