Vladimir_Nesov comments on zchuang’s Shortform

Vladimir_Nesov 18 Mar 2025 14:57 UTC
10 points
2
It’s a good unhobbling eval, because it’s a task that should be easy for current frontier LLMs at System 1 level, and they only fail because some basic memory/adaptation faculties that humans have are outright missing for AIs right now. No longer failing will be a milestone in no longer obviously missing such features (assuming the improvements are to general management of very long context, and not something overly eval-specific).