RSS

jacek

Karma: 189

Char­ac­ter­iz­ing sta­ble re­gions in the resi­d­ual stream of LLMs

26 Sep 2024 13:44 UTC
37 points
4 comments1 min readLW link
(arxiv.org)

Good­hart’s Law in Re­in­force­ment Learning

16 Oct 2023 0:54 UTC
125 points
22 comments7 min readLW link

A warm-up for the AI gov­er­nance project

jacek17 Feb 2023 18:06 UTC
10 points
2 comments3 min readLW link

Cat­e­gor­i­cal-mea­sure-the­o­retic ap­proach to op­ti­mal poli­cies tend­ing to seek power

jacek12 Jan 2023 0:32 UTC
31 points
3 comments6 min readLW link