RSS

lukemarks

Karma: 351

[Question] What Does LessWrong/​EA Think of Hu­man In­tel­li­gence Aug­men­ta­tion as of mid-2023?

lukemarks8 Jul 2023 11:42 UTC
84 points
28 comments2 min readLW link

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarks22 Apr 2023 14:36 UTC
39 points
7 comments6 min readLW link

Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’

lukemarks11 Jun 2023 0:13 UTC
22 points
0 comments5 min readLW link

Direct Prefer­ence Op­ti­miza­tion in One Minute

lukemarks26 Jun 2023 11:52 UTC
21 points
3 comments1 min readLW link

Select Agent Speci­fi­ca­tions as Nat­u­ral Abstractions

lukemarks7 Apr 2023 23:16 UTC
19 points
3 comments5 min readLW link

The Löbian Ob­sta­cle, And Why You Should Care

lukemarks7 Sep 2023 23:59 UTC
18 points
6 comments2 min readLW link

Par­tial Si­mu­la­tion Ex­trap­o­la­tion: A Pro­posal for Build­ing Safer Simulators

lukemarks17 Jun 2023 13:55 UTC
16 points
0 comments10 min readLW link

[Question] Shouldn’t we ‘Just’ Su­per­im­i­tate Low-Res Uploads?

lukemarks3 Nov 2023 7:42 UTC
15 points
2 comments2 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

3 Oct 2023 7:45 UTC
11 points
0 comments5 min readLW link

A Math­e­mat­i­cal Model for Simulators

lukemarks2 Oct 2023 6:46 UTC
11 points
0 comments2 min readLW link