RSS

Remmelt

Karma: 459

Research Coordinator of area “Do Not Build Uncontrollable AI” for AI Safety Camp.

See explainer on why AGI could not be controlled enough to stay safe:
https://​​www.lesswrong.com/​​posts/​​xp6n2MG5vQkPpFEBH/​​the-control-problem-unsolved-or-unsolvable

The first AI Safety Camp & onwards

Remmelt7 Jun 2018 20:13 UTC
46 points
0 comments8 min readLW link

The Values-to-Ac­tions De­ci­sion Chain

Remmelt30 Jun 2018 21:52 UTC
29 points
6 comments10 min readLW link

Del­e­gated agents in prac­tice: How com­pa­nies might end up sel­l­ing AI ser­vices that act on be­half of con­sumers and coal­i­tions, and what this im­plies for safety research

Remmelt26 Nov 2020 11:17 UTC
7 points
3 comments4 min readLW link

Some blindspots in ra­tio­nal­ity and effec­tive altruism

Remmelt19 Mar 2021 11:40 UTC
37 points
44 comments14 min readLW link

A parable of brightspots and blindspots

Remmelt21 Mar 2021 18:18 UTC
4 points
0 comments3 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 5

Remmelt28 Jun 2021 15:15 UTC
24 points
0 comments6 min readLW link

Ex­plor­ing Demo­cratic Dialogue be­tween Ra­tion­al­ity, Sili­con Valley, and the Wider World

Remmelt20 Aug 2021 16:04 UTC
−5 points
19 comments13 min readLW link

Why mechanis­tic in­ter­pretabil­ity does not and can­not con­tribute to long-term AGI safety (from mes­sages with a friend)

Remmelt19 Dec 2022 12:02 UTC
−3 points
9 comments31 min readLW link

List #1: Why stop­ping the de­vel­op­ment of AGI is hard but doable

Remmelt24 Dec 2022 9:52 UTC
6 points
11 comments5 min readLW link

List #2: Why co­or­di­nat­ing to al­ign as hu­mans to not de­velop AGI is a lot eas­ier than, well… co­or­di­nat­ing as hu­mans with AGI co­or­di­nat­ing to be al­igned with humans

Remmelt24 Dec 2022 9:53 UTC
1 point
0 comments3 min readLW link

List #3: Why not to as­sume on prior that AGI-al­ign­ment workarounds are available

Remmelt24 Dec 2022 9:54 UTC
4 points
1 comment3 min readLW link

Nine Points of Col­lec­tive Insanity

27 Dec 2022 3:14 UTC
−2 points
3 comments1 min readLW link
(mflb.com)

How ‘Hu­man-Hu­man’ dy­nam­ics give way to ‘Hu­man-AI’ and then ‘AI-AI’ dynamics

27 Dec 2022 3:16 UTC
−2 points
5 comments2 min readLW link
(mflb.com)

In­tro­duc­tion: Bias in Eval­u­at­ing AGI X-Risks

27 Dec 2022 10:27 UTC
1 point
0 comments3 min readLW link

In­sti­tu­tions Can­not Res­train Dark-Triad AI Exploitation

27 Dec 2022 10:34 UTC
5 points
0 comments5 min readLW link
(mflb.com)

Mere ex­po­sure effect: Bias in Eval­u­at­ing AGI X-Risks

27 Dec 2022 14:05 UTC
0 points
2 comments1 min readLW link

Pre­sump­tive Listen­ing: stick­ing to fa­mil­iar con­cepts and miss­ing the outer rea­son­ing paths

Remmelt27 Dec 2022 15:40 UTC
−14 points
8 comments2 min readLW link
(mflb.com)

Band­wagon effect: Bias in Eval­u­at­ing AGI X-Risks

28 Dec 2022 7:54 UTC
−1 points
0 comments1 min readLW link

Re­ac­tive de­val­u­a­tion: Bias in Eval­u­at­ing AGI X-Risks

30 Dec 2022 9:02 UTC
−15 points
9 comments1 min readLW link

Curse of knowl­edge and Naive re­al­ism: Bias in Eval­u­at­ing AGI X-Risks

31 Dec 2022 13:33 UTC
−7 points
1 comment1 min readLW link
(www.lesswrong.com)