Aligned AI Proposals

TagLast edit: 20 Dec 2022 15:20 UTC by particlemania

An LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:12 UTC
15 points
0 comments12 min readLW link

Align­ing an H-JEPA agent via train­ing on the out­puts of an LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:08 UTC
11 points
10 comments29 min readLW link

The Goal Mis­gen­er­al­iza­tion Problem

Myspy18 May 2023 23:40 UTC
1 point
0 comments1 min readLW link

Pro­posal: Align Sys­tems Ear­lier In Training

OneManyNone16 May 2023 16:24 UTC
18 points
0 comments11 min readLW link

An­no­tated re­ply to Ben­gio’s “AI Scien­tists: Safe and Use­ful AI?”

Roman Leventov8 May 2023 21:26 UTC
18 points
1 comment7 min readLW link

Against sac­ri­fic­ing AI trans­parency for gen­er­al­ity gains

Ape in the coat7 May 2023 6:52 UTC
−3 points
0 comments2 min readLW link

A Pro­posal for AI Align­ment: Us­ing Directly Op­pos­ing Models

Arne B27 Apr 2023 18:05 UTC
0 points
0 comments3 min readLW link

How to ex­press this sys­tem for eth­i­cally al­igned AGI as a Math­e­mat­i­cal for­mula?

Oliver Siegel19 Apr 2023 20:13 UTC
−1 points
0 comments1 min readLW link

Spec­u­la­tion on map­ping the moral land­scape for fu­ture Ai Alignment

Sven Heinz (Welwordion)16 Apr 2023 13:43 UTC
1 point
0 comments1 min readLW link

Re­search pro­posal: Lev­er­ag­ing Jun­gian archetypes to cre­ate val­ues-based models

MiguelDev5 Mar 2023 17:39 UTC
5 points
2 comments2 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidad20 Dec 2022 13:04 UTC
55 points
22 comments4 min readLW link
No comments.