shash42

Karma: 440

Humans can post on moltbook

shash4231 Jan 2026 21:06 UTC

24 points

3 comments1 min readLW link

OpenForecaster: How to train language models for open-ended forecasting?

nikhilchandak, shash42 and bayesian_kitten

7 Jan 2026 11:03 UTC

10 points

1 comment7 min readLW link

How to game the METR plot

shash4220 Dec 2025 13:46 UTC

243 points

32 comments5 min readLW link

New Paper: It is time to move on from MCQs for LLM Evaluations

shash426 Jul 2025 11:48 UTC

9 points

0 comments2 min readLW link

An Alternative Way to Forecast AGI: Counting Down Capabilities

shash4229 Jun 2025 19:52 UTC

3 points

0 comments3 min readLW link

(open.substack.com)

Incorrect Baseline Evaluations Call into Question Recent LLM-RL Claims

shash4229 May 2025 18:40 UTC

66 points

7 comments1 min readLW link

(safe-lip-9a8.notion.site)

Log-linear Scaling is Worth the Cost due to Gains in Long-Horizon Tasks

shash427 Apr 2025 21:50 UTC

16 points

2 comments1 min readLW link

shash42′s Shortform

shash4215 Dec 2024 18:49 UTC

2 points

0 comments1 min readLW link

Evaluating hidden directions on the utility dataset: classification, steering and removal

Annah and shash42

25 Sep 2023 17:19 UTC

25 points

3 comments7 min readLW link