RSS

AI-as­sisted/​AI au­to­mated Alignment

TagLast edit: 1 Jan 2023 22:04 UTC by Noosphere89

Not obviously the best name for this tag, but maybe good to explore/​rename. Wiki-tags are publicly editable!

Beliefs and Disagree­ments about Au­tomat­ing Align­ment Research

Ian McKenzie24 Aug 2022 18:37 UTC
98 points
4 comments7 min readLW link

Cyborgism

10 Feb 2023 14:47 UTC
294 points
41 comments35 min readLW link

Dis­cus­sion on uti­liz­ing AI for alignment

elifland23 Aug 2022 2:36 UTC
16 points
3 comments1 min readLW link
(www.foxy-scout.com)

Godzilla Strategies

johnswentworth11 Jun 2022 15:44 UTC
139 points
64 comments3 min readLW link

Suffi­ciently many Godzillas as an al­ign­ment strategy

14285728 Aug 2022 0:08 UTC
8 points
3 comments1 min readLW link

AI-as­sisted list of ten con­crete al­ign­ment things to do right now

lukehmiles7 Sep 2022 8:38 UTC
8 points
5 comments4 min readLW link

In­finite Pos­si­bil­ity Space and the Shut­down Problem

magfrump18 Oct 2022 5:37 UTC
6 points
0 comments2 min readLW link
(www.magfrump.net)

Get­ting from an un­al­igned AGI to an al­igned AGI?

Tor Økland Barstad21 Jun 2022 12:36 UTC
11 points
7 comments9 min readLW link

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland Barstad9 Jul 2022 14:42 UTC
14 points
5 comments22 min readLW link

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland Barstad13 Dec 2022 2:17 UTC
7 points
5 comments45 min readLW link

A sur­vey of tool use and work­flows in al­ign­ment research

23 Mar 2022 23:44 UTC
44 points
5 comments1 min readLW link

My thoughts on OpenAI’s al­ign­ment plan

Akash30 Dec 2022 19:33 UTC
54 points
2 comments20 min readLW link

[Linkpost] Jan Leike on three kinds of al­ign­ment taxes

Akash6 Jan 2023 23:57 UTC
27 points
2 comments3 min readLW link
(aligned.substack.com)

[Question] What spe­cific thing would you do with AI Align­ment Re­search As­sis­tant GPT?

quetzal_rainbow8 Jan 2023 19:24 UTC
45 points
9 comments1 min readLW link

Reflec­tions on De­cep­tion & Gen­er­al­ity in Scal­able Over­sight (Another OpenAI Align­ment Re­view)

Shoshannah Tekofsky28 Jan 2023 5:26 UTC
52 points
6 comments7 min readLW link

Model-driven feed­back could am­plify al­ign­ment failures

aogara30 Jan 2023 0:00 UTC
17 points
1 comment2 min readLW link

Eli Lifland on Nav­i­gat­ing the AI Align­ment Landscape

ozziegooen1 Feb 2023 21:17 UTC
9 points
1 comment31 min readLW link
(quri.substack.com)

Cy­borg Pe­ri­ods: There will be mul­ti­ple AI transitions

22 Feb 2023 16:09 UTC
78 points
8 comments6 min readLW link

We have to Upgrade

Jed McCaleb23 Mar 2023 17:53 UTC
54 points
9 comments2 min readLW link

[Question] Would you ask a ge­nie to give you the solu­tion to al­ign­ment?

sudo -i24 Aug 2022 1:29 UTC
6 points
1 comment1 min readLW link

Prize for Align­ment Re­search Tasks

29 Apr 2022 8:57 UTC
63 points
36 comments10 min readLW link

Ngo and Yud­kowsky on al­ign­ment difficulty

15 Nov 2021 20:31 UTC
243 points
143 comments99 min readLW link1 review

How should Deep­Mind’s Chin­chilla re­vise our AI fore­casts?

Cleo Nardo15 Sep 2022 17:54 UTC
35 points
12 comments13 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
52 points
8 comments20 min readLW link

Prov­ably Hon­est—A First Step

Srijanak De5 Nov 2022 19:18 UTC
10 points
2 comments8 min readLW link

Re­search re­quest (al­ign­ment strat­egy): Deep dive on “mak­ing AI solve al­ign­ment for us”

JanBrauner1 Dec 2022 14:55 UTC
16 points
3 comments1 min readLW link

[Link] A min­i­mal vi­able product for alignment

janleike6 Apr 2022 15:38 UTC
51 points
38 comments1 min readLW link

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

19 Dec 2022 15:19 UTC
69 points
2 comments19 min readLW link

[Link] Why I’m op­ti­mistic about OpenAI’s al­ign­ment approach

janleike5 Dec 2022 22:51 UTC
96 points
13 comments1 min readLW link
(aligned.substack.com)

Hu­man Mimicry Mainly Works When We’re Already Close

johnswentworth17 Aug 2022 18:41 UTC
70 points
16 comments5 min readLW link

Re­search Direc­tion: Be the AGI you want to see in the world

5 Feb 2023 7:15 UTC
43 points
0 comments7 min readLW link

Cu­ri­os­ity as a Solu­tion to AGI Alignment

Harsha G.26 Feb 2023 23:36 UTC
8 points
7 comments3 min readLW link

In­tro­duc­ing AI Align­ment Inc., a Cal­ifor­nia pub­lic benefit cor­po­ra­tion...

TherapistAI7 Mar 2023 18:47 UTC
1 point
4 comments1 min readLW link

Pro­ject “MIRI as a Ser­vice”

RomanS8 Mar 2023 19:22 UTC
40 points
4 comments1 min readLW link

Ex­plor­ing the Pre­cau­tion­ary Prin­ci­ple in AI Devel­op­ment: His­tor­i­cal Analo­gies and Les­sons Learned

Christopher King21 Mar 2023 3:53 UTC
−1 points
1 comment9 min readLW link
No comments.