Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
janos
Karma:
239
All
Posts
Comments
New
Top
Old
On scalable oversight with weak LLMs judging strong LLMs
zac_kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
and
Rohin Shah
8 Jul 2024 8:59 UTC
49
points
18
comments
7
min read
LW
link
(arxiv.org)
Power-seeking can be probable and predictive for trained agents
Vika
and
janos
28 Feb 2023 21:10 UTC
56
points
22
comments
9
min read
LW
link
(arxiv.org)
Back to top