RSS

Charlie Steiner(Charlie Steiner)

Karma: 6,678

If you want to chat, message me!

LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.

Some back­ground for rea­son­ing about dual-use al­ign­ment research

Charlie Steiner18 May 2023 14:50 UTC
119 points
19 comments9 min readLW link

Neu­ral un­cer­tainty es­ti­ma­tion re­view ar­ti­cle (for al­ign­ment)

Charlie Steiner5 Dec 2023 8:01 UTC
69 points
1 comment11 min readLW link

How to turn money into AI safety?

Charlie Steiner25 Aug 2021 10:49 UTC
66 points
26 comments8 min readLW link

Take 13: RLHF bad, con­di­tion­ing good.

Charlie Steiner22 Dec 2022 10:44 UTC
53 points
4 comments2 min readLW link

The Pre­sump­tu­ous Philoso­pher, self-lo­cat­ing in­for­ma­tion, and Solomonoff induction

Charlie Steiner31 May 2020 16:35 UTC
52 points
28 comments3 min readLW link

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Charlie Steiner8 Dec 2022 8:14 UTC
50 points
22 comments2 min readLW link

Read­ing the ethi­cists: A re­view of ar­ti­cles on AI in the jour­nal Science and Eng­ineer­ing Ethics

Charlie Steiner18 May 2022 20:52 UTC
50 points
8 comments14 min readLW link

Book Re­view: Con­scious­ness Explained

Charlie Steiner6 Mar 2018 3:32 UTC
48 points
20 comments21 min readLW link

In­tro­duc­tion to Re­duc­ing Goodhart

Charlie Steiner26 Aug 2021 18:38 UTC
47 points
10 comments4 min readLW link

HCH Spec­u­la­tion Post #2A

Charlie Steiner17 Mar 2021 13:26 UTC
42 points
7 comments9 min readLW link

The Solomonoff prior is ma­lign. It’s not a big deal.

Charlie Steiner25 Aug 2022 8:25 UTC
41 points
9 comments7 min readLW link

Philos­o­phy as low-en­ergy approximation

Charlie Steiner5 Feb 2019 19:34 UTC
40 points
20 comments3 min readLW link

Take 1: We’re not go­ing to re­verse-en­g­ineer the AI.

Charlie Steiner1 Dec 2022 22:41 UTC
38 points
4 comments4 min readLW link

How to get value learn­ing and refer­ence wrong

Charlie Steiner26 Feb 2019 20:22 UTC
37 points
2 comments6 min readLW link

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Charlie Steiner13 Dec 2022 7:04 UTC
37 points
3 comments2 min readLW link

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Charlie Steiner5 Dec 2022 10:39 UTC
36 points
4 comments1 min readLW link

How to solve de­cep­tion and still fail.

Charlie Steiner4 Oct 2023 19:56 UTC
36 points
7 comments6 min readLW link

Take 9: No, RLHF/​IDA/​de­bate doesn’t solve outer al­ign­ment.

Charlie Steiner12 Dec 2022 11:51 UTC
33 points
14 comments2 min readLW link

Shard the­ory al­ign­ment has im­por­tant, of­ten-over­looked free pa­ram­e­ters.

Charlie Steiner20 Jan 2023 9:30 UTC
33 points
10 comments3 min readLW link

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie Steiner18 Dec 2022 14:14 UTC
32 points
0 comments2 min readLW link