Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
shash42
Karma:
129
All
Posts
Comments
New
Top
Old
New Paper: It is time to move on from MCQs for LLM Evaluations
shash42
6 Jul 2025 11:48 UTC
9
points
0
comments
2
min read
LW
link
An Alternative Way to Forecast AGI: Counting Down Capabilities
shash42
29 Jun 2025 19:52 UTC
3
points
0
comments
3
min read
LW
link
(open.substack.com)
Incorrect Baseline Evaluations Call into Question Recent LLM-RL Claims
shash42
29 May 2025 18:40 UTC
65
points
7
comments
1
min read
LW
link
(safe-lip-9a8.notion.site)
Log-linear Scaling is Worth the Cost due to Gains in Long-Horizon Tasks
shash42
7 Apr 2025 21:50 UTC
16
points
2
comments
1
min read
LW
link
shash42′s Shortform
shash42
15 Dec 2024 18:49 UTC
2
points
0
comments
LW
link
Evaluating hidden directions on the utility dataset: classification, steering and removal
Annah
and
shash42
25 Sep 2023 17:19 UTC
25
points
3
comments
7
min read
LW
link
Back to top