Well, the model needs to figure out if the test would be cost prohibitive to make make when the test was made.
But the model doesn’t know if its being tested in 2026. It could the year 2032, with programming resources advanced substantially, and the model was trained on data truncated to 2026 (or a faked 2026). There are many reasons to do this, e.g. to validate the model’s predictions about the future to what actually happened in its future.
Agree with the general point but:
Well, no, evaluating the perplexity of an offline transcript doesn’t look the same as taking real actions. One time I found myself at school after class when one of my molars came loose and literally fell out of my mouth. I started frantically trying to figure out how to set up an emergency dental appointment. But then I realized “Wait, why am I at school? I graduated from school years ago. Oh! Duh! I’m dreaming!”. I decided to shrug off my teeth falling out, despite not being the sort of person who would realistically shrug off an emergency medical situation.