RSS

Charlie Steiner(Charlie Steiner)

Karma: 6,678

If you want to chat, message me!

LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.

Hu­mans aren’t fleeb.

Charlie Steiner24 Jan 2024 5:31 UTC
32 points
5 comments2 min readLW link

Neu­ral un­cer­tainty es­ti­ma­tion re­view ar­ti­cle (for al­ign­ment)

Charlie Steiner5 Dec 2023 8:01 UTC
69 points
1 comment11 min readLW link

How to solve de­cep­tion and still fail.

Charlie Steiner4 Oct 2023 19:56 UTC
36 points
7 comments6 min readLW link

Two Hot Takes about Quine

Charlie Steiner11 Jul 2023 6:42 UTC
15 points
0 comments2 min readLW link

Some back­ground for rea­son­ing about dual-use al­ign­ment research

Charlie Steiner18 May 2023 14:50 UTC
119 points
19 comments9 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #2 Semiotic physics—revamped

27 Feb 2023 0:25 UTC
23 points
23 comments13 min readLW link

Shard the­ory al­ign­ment has im­por­tant, of­ten-over­looked free pa­ram­e­ters.

Charlie Steiner20 Jan 2023 9:30 UTC
33 points
10 comments3 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #1 Back­ground & shared assumptions

2 Jan 2023 23:48 UTC
49 points
4 comments3 min readLW link

Take 14: Cor­rigi­bil­ity isn’t that great.

Charlie Steiner25 Dec 2022 13:04 UTC
15 points
3 comments3 min readLW link

Take 13: RLHF bad, con­di­tion­ing good.

Charlie Steiner22 Dec 2022 10:44 UTC
53 points
4 comments2 min readLW link

Take 12: RLHF’s use is ev­i­dence that orgs will jam RL at real-world prob­lems.

Charlie Steiner20 Dec 2022 5:01 UTC
25 points
1 comment3 min readLW link

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie Steiner18 Dec 2022 14:14 UTC
32 points
0 comments2 min readLW link

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Charlie Steiner13 Dec 2022 7:04 UTC
37 points
3 comments2 min readLW link

Take 9: No, RLHF/​IDA/​de­bate doesn’t solve outer al­ign­ment.

Charlie Steiner12 Dec 2022 11:51 UTC
33 points
14 comments2 min readLW link

Take 8: Queer the in­ner/​outer al­ign­ment di­chotomy.

Charlie Steiner9 Dec 2022 17:46 UTC
28 points
2 comments2 min readLW link

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Charlie Steiner8 Dec 2022 8:14 UTC
50 points
22 comments2 min readLW link

Take 6: CAIS is ac­tu­ally Or­wellian.

Charlie Steiner7 Dec 2022 13:50 UTC
14 points
8 comments2 min readLW link

Take 5: Another prob­lem for nat­u­ral ab­strac­tions is laz­i­ness.

Charlie Steiner6 Dec 2022 7:00 UTC
30 points
4 comments3 min readLW link

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Charlie Steiner5 Dec 2022 10:39 UTC
36 points
4 comments1 min readLW link

Take 3: No in­de­scrib­able heav­en­wor­lds.

Charlie Steiner4 Dec 2022 2:48 UTC
23 points
12 comments2 min readLW link