Daniel Tan comments on Thomas Kwa’s Shortform

Daniel Tan 23 Apr 2025 2:31 UTC
2 points
0
Interesting paper. Quick thoughts:
- I agree the benchmark seems saturated. It’s interesting that the authors frame it the other way—Section 4.1 focuses on how models are not maximally goal-directed.
- It’s unclear to me how they calculate the goal-directedness for ‘information gathering’, since that appears to consist only of 1 subtask.