RSS

Charlie Steiner(Charlie Steiner)

Karma: 5,085

LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.

[Si­mu­la­tors sem­i­nar se­quence] #2 Semiotic physics—revamped

27 Feb 2023 0:25 UTC
20 points
22 comments13 min readLW link

Shard the­ory al­ign­ment has im­por­tant, of­ten-over­looked free pa­ram­e­ters.

Charlie Steiner20 Jan 2023 9:30 UTC
32 points
10 comments3 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #1 Back­ground & shared assumptions

2 Jan 2023 23:48 UTC
46 points
4 comments3 min readLW link

Take 14: Cor­rigi­bil­ity isn’t that great.

Charlie Steiner25 Dec 2022 13:04 UTC
15 points
3 comments3 min readLW link

Take 13: RLHF bad, con­di­tion­ing good.

Charlie Steiner22 Dec 2022 10:44 UTC
53 points
4 comments2 min readLW link

Take 12: RLHF’s use is ev­i­dence that orgs will jam RL at real-world prob­lems.

Charlie Steiner20 Dec 2022 5:01 UTC
25 points
1 comment3 min readLW link

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie Steiner18 Dec 2022 14:14 UTC
31 points
0 comments2 min readLW link

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Charlie Steiner13 Dec 2022 7:04 UTC
36 points
3 comments2 min readLW link

Take 9: No, RLHF/​IDA/​de­bate doesn’t solve outer al­ign­ment.

Charlie Steiner12 Dec 2022 11:51 UTC
33 points
14 comments2 min readLW link

Take 8: Queer the in­ner/​outer al­ign­ment di­chotomy.

Charlie Steiner9 Dec 2022 17:46 UTC
28 points
3 comments2 min readLW link

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Charlie Steiner8 Dec 2022 8:14 UTC
47 points
22 comments2 min readLW link

Take 6: CAIS is ac­tu­ally Or­wellian.

Charlie Steiner7 Dec 2022 13:50 UTC
14 points
5 comments2 min readLW link

Take 5: Another prob­lem for nat­u­ral ab­strac­tions is laz­i­ness.

Charlie Steiner6 Dec 2022 7:00 UTC
30 points
4 comments3 min readLW link

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Charlie Steiner5 Dec 2022 10:39 UTC
36 points
4 comments1 min readLW link

Take 3: No in­de­scrib­able heav­en­wor­lds.

Charlie Steiner4 Dec 2022 2:48 UTC
22 points
12 comments2 min readLW link

Take 2: Build­ing tools to help build FAI is a le­gi­t­i­mate strat­egy, but it’s dual-use.

Charlie Steiner3 Dec 2022 0:54 UTC
17 points
1 comment2 min readLW link

Take 1: We’re not go­ing to re­verse-en­g­ineer the AI.

Charlie Steiner1 Dec 2022 22:41 UTC
38 points
4 comments4 min readLW link

Some ideas for epis­tles to the AI ethicists

Charlie Steiner14 Sep 2022 9:07 UTC
19 points
0 comments4 min readLW link

The Solomonoff prior is ma­lign. It’s not a big deal.

Charlie Steiner25 Aug 2022 8:25 UTC
38 points
9 comments7 min readLW link

Re­duc­ing Good­hart: An­nounce­ment, Ex­ec­u­tive Summary

Charlie Steiner20 Aug 2022 9:49 UTC
14 points
0 comments1 min readLW link