RSS

Charlie Steiner(Charlie Steiner)

Karma: 5,451

LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.

Some back­ground for rea­son­ing about dual-use al­ign­ment research

Charlie Steiner18 May 2023 14:50 UTC
74 points
5 comments9 min readLW link

How to turn money into AI safety?

Charlie Steiner25 Aug 2021 10:49 UTC
66 points
26 comments8 min readLW link

Take 13: RLHF bad, con­di­tion­ing good.

Charlie Steiner22 Dec 2022 10:44 UTC
53 points
4 comments2 min readLW link

Read­ing the ethi­cists: A re­view of ar­ti­cles on AI in the jour­nal Science and Eng­ineer­ing Ethics

Charlie Steiner18 May 2022 20:52 UTC
50 points
8 comments14 min readLW link

The Pre­sump­tu­ous Philoso­pher, self-lo­cat­ing in­for­ma­tion, and Solomonoff induction

Charlie Steiner31 May 2020 16:35 UTC
49 points
28 comments3 min readLW link

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Charlie Steiner8 Dec 2022 8:14 UTC
47 points
22 comments2 min readLW link

Book Re­view: Con­scious­ness Explained

Charlie Steiner6 Mar 2018 3:32 UTC
45 points
15 comments21 min readLW link

HCH Spec­u­la­tion Post #2A

Charlie Steiner17 Mar 2021 13:26 UTC
42 points
7 comments9 min readLW link

Philos­o­phy as low-en­ergy approximation

Charlie Steiner5 Feb 2019 19:34 UTC
40 points
20 comments3 min readLW link

In­tro­duc­tion to Re­duc­ing Goodhart

Charlie Steiner26 Aug 2021 18:38 UTC
40 points
10 comments4 min readLW link

Take 1: We’re not go­ing to re­verse-en­g­ineer the AI.

Charlie Steiner1 Dec 2022 22:41 UTC
38 points
4 comments4 min readLW link

The Solomonoff prior is ma­lign. It’s not a big deal.

Charlie Steiner25 Aug 2022 8:25 UTC
38 points
9 comments7 min readLW link

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Charlie Steiner13 Dec 2022 7:04 UTC
37 points
3 comments2 min readLW link

How to get value learn­ing and refer­ence wrong

Charlie Steiner26 Feb 2019 20:22 UTC
37 points
2 comments6 min readLW link

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Charlie Steiner5 Dec 2022 10:39 UTC
36 points
4 comments1 min readLW link

Take 9: No, RLHF/​IDA/​de­bate doesn’t solve outer al­ign­ment.

Charlie Steiner12 Dec 2022 11:51 UTC
33 points
14 comments2 min readLW link

Shard the­ory al­ign­ment has im­por­tant, of­ten-over­looked free pa­ram­e­ters.

Charlie Steiner20 Jan 2023 9:30 UTC
32 points
10 comments3 min readLW link

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie Steiner18 Dec 2022 14:14 UTC
31 points
0 comments2 min readLW link

Take 5: Another prob­lem for nat­u­ral ab­strac­tions is laz­i­ness.

Charlie Steiner6 Dec 2022 7:00 UTC
30 points
4 comments3 min readLW link

New year, new re­search agenda post

Charlie Steiner12 Jan 2022 17:58 UTC
29 points
4 comments16 min readLW link