RSS

janos

Karma: 239

On scal­able over­sight with weak LLMs judg­ing strong LLMs

8 Jul 2024 8:59 UTC
49 points
18 comments7 min readLW link
(arxiv.org)

Power-seek­ing can be prob­a­ble and pre­dic­tive for trained agents

28 Feb 2023 21:10 UTC
56 points
22 comments9 min readLW link
(arxiv.org)