RSS

Aligned AI Proposals

TagLast edit: 20 Dec 2022 15:20 UTC by particlemania

How to safely use an optimizer

Simon Fischer28 Mar 2024 16:11 UTC
47 points
21 comments7 min readLW link

Strong-Misal­ign­ment: Does Yud­kowsky (or Chris­ti­ano, or TurnTrout, or Wolfram, or…etc.) Have an Ele­va­tor Speech I’m Miss­ing?

Benjamin Bourlier15 Mar 2024 23:17 UTC
−4 points
3 comments16 min readLW link

Up­date on Devel­op­ing an Ethics Calcu­la­tor to Align an AGI to

sweenesm12 Mar 2024 12:33 UTC
4 points
2 comments8 min readLW link

Align­ment in Thought Chains

Faust Nemesis4 Mar 2024 19:24 UTC
1 point
0 comments2 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
20 points
6 comments31 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
4 points
15 comments13 min readLW link

Pro­posal for an AI Safety Prize

sweenesm31 Jan 2024 18:35 UTC
3 points
0 comments2 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
22 points
4 comments39 min readLW link

Good­bye, Shog­goth: The Stage, its An­i­ma­tron­ics, & the Pup­peteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC
46 points
8 comments36 min readLW link

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
35 points
4 comments2 min readLW link

Safety First: safety be­fore full al­ign­ment. The de­on­tic suffi­ciency hy­poth­e­sis.

Chipmonk3 Jan 2024 17:55 UTC
47 points
3 comments3 min readLW link

AI Align­ment Metastrategy

Vanessa Kosoy31 Dec 2023 12:06 UTC
113 points
12 comments7 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC
30 points
8 comments9 min readLW link

Scal­able Over­sight and Weak-to-Strong Gen­er­al­iza­tion: Com­pat­i­ble ap­proaches to the same problem

16 Dec 2023 5:49 UTC
71 points
3 comments6 min readLW link

Lan­guage Model Me­moriza­tion, Copy­right Law, and Con­di­tional Pre­train­ing Alignment

RogerDearnaley7 Dec 2023 6:14 UTC
3 points
0 comments11 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC
64 points
30 comments11 min readLW link

Is In­ter­pretabil­ity All We Need?

RogerDearnaley14 Nov 2023 5:31 UTC
1 point
1 comment1 min readLW link

We have promis­ing al­ign­ment plans with low taxes

Seth Herd10 Nov 2023 18:51 UTC
30 points
9 comments5 min readLW link

AI Align­ment: A Com­pre­hen­sive Survey

Stephen McAleer1 Nov 2023 17:35 UTC
15 points
1 comment1 min readLW link
(arxiv.org)

The (par­tial) fal­lacy of dumb superintelligence

Seth Herd18 Oct 2023 21:25 UTC
27 points
5 comments4 min readLW link

A list of core AI safety prob­lems and how I hope to solve them

davidad26 Aug 2023 15:12 UTC
161 points
23 comments5 min readLW link

En­hanc­ing Cor­rigi­bil­ity in AI Sys­tems through Ro­bust Feed­back Loops

Justausername24 Aug 2023 3:53 UTC
1 point
0 comments6 min readLW link

[Question] Bostrom’s Solution

James Blackmon14 Aug 2023 17:09 UTC
1 point
0 comments1 min readLW link

Re­duc­ing the risk of catas­troph­i­cally mis­al­igned AI by avoid­ing the Sin­gle­ton sce­nario: the Many­ton Variant

GravitasGradient6 Aug 2023 14:24 UTC
−6 points
0 comments3 min readLW link

Embed­ding Eth­i­cal Pri­ors into AI Sys­tems: A Bayesian Approach

Justausername3 Aug 2023 15:31 UTC
−5 points
3 comments21 min readLW link

Au­tonomous Align­ment Over­sight Frame­work (AAOF)

Justausername25 Jul 2023 10:25 UTC
−9 points
0 comments4 min readLW link

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

Justausername23 Jul 2023 16:08 UTC
4 points
1 comment3 min readLW link

Desider­ata for an AI

Nathan Helm-Burger19 Jul 2023 16:18 UTC
8 points
0 comments4 min readLW link

Two paths to win the AGI transition

Nathan Helm-Burger6 Jul 2023 21:59 UTC
11 points
8 comments4 min readLW link

an Evan­ge­lion di­alogue ex­plain­ing the QACI al­ign­ment plan

Tamsin Leake10 Jun 2023 3:28 UTC
45 points
15 comments43 min readLW link
(carado.moe)

An LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:12 UTC
16 points
0 comments12 min readLW link

Align­ing an H-JEPA agent via train­ing on the out­puts of an LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:08 UTC
12 points
10 comments30 min readLW link

The Goal Mis­gen­er­al­iza­tion Problem

Myspy18 May 2023 23:40 UTC
1 point
0 comments1 min readLW link
(drive.google.com)

Pro­posal: Align Sys­tems Ear­lier In Training

OneManyNone16 May 2023 16:24 UTC
18 points
0 comments11 min readLW link

An­no­tated re­ply to Ben­gio’s “AI Scien­tists: Safe and Use­ful AI?”

Roman Leventov8 May 2023 21:26 UTC
18 points
2 comments7 min readLW link
(yoshuabengio.org)

Against sac­ri­fic­ing AI trans­parency for gen­er­al­ity gains

Ape in the coat7 May 2023 6:52 UTC
4 points
0 comments2 min readLW link

A Pro­posal for AI Align­ment: Us­ing Directly Op­pos­ing Models

Arne B27 Apr 2023 18:05 UTC
0 points
5 comments3 min readLW link

How to ex­press this sys­tem for eth­i­cally al­igned AGI as a Math­e­mat­i­cal for­mula?

Oliver Siegel19 Apr 2023 20:13 UTC
−1 points
0 comments1 min readLW link

Spec­u­la­tion on map­ping the moral land­scape for fu­ture Ai Alignment

Sven Heinz (Welwordion)16 Apr 2023 13:43 UTC
1 point
0 comments1 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidad20 Dec 2022 13:04 UTC
79 points
22 comments4 min readLW link

Mo­ral re­al­ism and AI alignment

Caspar Oesterheld3 Sep 2018 18:46 UTC
13 points
10 comments1 min readLW link
(casparoesterheld.com)
No comments.