RSS

shash42

Karma: 432

Hu­mans can post on moltbook

shash4231 Jan 2026 21:06 UTC
24 points
2 comments1 min readLW link

OpenFore­caster: How to train lan­guage mod­els for open-ended fore­cast­ing?

7 Jan 2026 11:03 UTC
9 points
0 comments7 min readLW link

How to game the METR plot

shash4220 Dec 2025 13:46 UTC
236 points
29 comments5 min readLW link

New Paper: It is time to move on from MCQs for LLM Evaluations

shash426 Jul 2025 11:48 UTC
9 points
0 comments2 min readLW link

An Alter­na­tive Way to Fore­cast AGI: Count­ing Down Ca­pa­bil­ities

shash4229 Jun 2025 19:52 UTC
3 points
0 comments3 min readLW link
(open.substack.com)

In­cor­rect Baseline Eval­u­a­tions Call into Ques­tion Re­cent LLM-RL Claims

shash4229 May 2025 18:40 UTC
66 points
7 comments1 min readLW link
(safe-lip-9a8.notion.site)

Log-lin­ear Scal­ing is Worth the Cost due to Gains in Long-Hori­zon Tasks

shash427 Apr 2025 21:50 UTC
16 points
2 comments1 min readLW link

shash42′s Shortform

shash4215 Dec 2024 18:49 UTC
2 points
0 comments1 min readLW link

Eval­u­at­ing hid­den di­rec­tions on the util­ity dataset: clas­sifi­ca­tion, steer­ing and removal

25 Sep 2023 17:19 UTC
25 points
3 comments7 min readLW link