Mon­i­tor­ing for de­cep­tive alignment

evhub8 Sep 2022 23:07 UTC
135 points
8 comments9 min readLW link

[An email with a bunch of links I sent an ex­pe­rienced ML re­searcher in­ter­ested in learn­ing about Align­ment /​ x-safety.]

David Scott Krueger (formerly: capybaralet)8 Sep 2022 22:28 UTC
47 points
1 comment5 min readLW link

Progress links & tweets, 2022-09-08

jasoncrawford8 Sep 2022 20:43 UTC
13 points
3 comments1 min readLW link
(rootsofprogress.org)

Turn­ing What­sApp Chat Data into Prompt-Re­sponse Form for Fine-Tuning

hatta_afiq8 Sep 2022 20:05 UTC
1 point
0 comments1 min readLW link

Post­mortem: Try­ing out for Man­i­fold Markets

8 Sep 2022 17:54 UTC
24 points
0 comments3 min readLW link

Thoughts on AGI con­scious­ness /​ sentience

Steven Byrnes8 Sep 2022 16:40 UTC
38 points
37 comments6 min readLW link

A rough idea for solv­ing ELK: An ap­proach for train­ing gen­er­al­ist agents like GATO to make plans and de­scribe them to hu­mans clearly and hon­estly.

Michael Soareverix8 Sep 2022 15:20 UTC
2 points
2 comments2 min readLW link

What Should AI Owe To Us? Ac­countable and Aligned AI Sys­tems via Con­trac­tu­al­ist AI Alignment

xuan8 Sep 2022 15:04 UTC
32 points
15 comments25 min readLW link

ACX Book Re­view Discussion

Screwtape8 Sep 2022 14:22 UTC
5 points
0 comments1 min readLW link

Covid 9/​8/​22: Booster Boosting

Zvi8 Sep 2022 13:50 UTC
34 points
9 comments24 min readLW link
(thezvi.wordpress.com)

So­lar Black­out Resistance

jefftk8 Sep 2022 13:30 UTC
69 points
32 comments3 min readLW link
(www.jefftk.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [Sept 2022]

plex8 Sep 2022 11:56 UTC
22 points
48 comments2 min readLW link

[Question] Se­quences/​Eliezer es­says be­yond those in AI to Zom­bies?

Domenic8 Sep 2022 5:05 UTC
4 points
4 comments1 min readLW link

Linkpost: Github Copi­lot pro­duc­tivity experiment

Daniel Kokotajlo8 Sep 2022 4:41 UTC
88 points
4 comments1 min readLW link
(github.blog)

OpenPrin­ci­ples Boot­camp (Free) -- Reflect & Act on your Ra­tion­al­ity Prin­ci­ples.

ti_guo8 Sep 2022 3:06 UTC
6 points
3 comments4 min readLW link

Search­ing for Mo­du­lar­ity in Large Lan­guage Models

8 Sep 2022 2:25 UTC
44 points
3 comments14 min readLW link

90% of any­thing should be bad (& the pre­ci­sion-re­call trade­off)

cartografie8 Sep 2022 1:20 UTC
33 points
22 comments6 min readLW link

How to Do Re­search. v1

Pablo Repetto8 Sep 2022 1:08 UTC
29 points
4 comments41 min readLW link
(pabloernesto.github.io)

Galaxy Trucker Needs a New Se­cond Half

jefftk7 Sep 2022 20:10 UTC
13 points
7 comments1 min readLW link
(www.jefftk.com)

[Question] In a lack of data, how should you weigh cre­dences in the­o­ret­i­cal physics’s The­o­ries of Every­thing, or TOEs?

Noosphere897 Sep 2022 18:25 UTC
7 points
11 comments1 min readLW link

Gen­er­a­tors Of Disagree­ment With AI Alignment

George3d67 Sep 2022 18:15 UTC
27 points
9 comments9 min readLW link
(www.epistem.ink)

Shröd­inger’s lot­tery or: Why you are go­ing to live forever

Chase Dowdell7 Sep 2022 18:13 UTC
1 point
2 comments4 min readLW link

Is train­ing data go­ing to be diluted by AI-gen­er­ated con­tent?

Hannes Thurnherr7 Sep 2022 18:13 UTC
10 points
7 comments1 min readLW link

It’s (not) how you use it

Eleni Angelou7 Sep 2022 17:15 UTC
8 points
1 comment2 min readLW link

First we shape our so­cial graph; then it shapes us

Henrik Karlsson7 Sep 2022 15:50 UTC
52 points
6 comments8 min readLW link
(escapingflatland.substack.com)

AI-as­sisted list of ten con­crete al­ign­ment things to do right now

lukehmiles7 Sep 2022 8:38 UTC
8 points
5 comments4 min readLW link

Can “Re­ward Eco­nomics” solve AI Align­ment?

Q Home7 Sep 2022 7:58 UTC
3 points
15 comments18 min readLW link

Is there a list of pro­jects to get started with In­ter­pretabil­ity?

Franziska Fischer7 Sep 2022 4:27 UTC
8 points
2 comments1 min readLW link

Progress Re­port 7: mak­ing GPT go hur­rdurr in­stead of brrrrrrr

Nathan Helm-Burger7 Sep 2022 3:28 UTC
21 points
0 comments4 min readLW link

Fram­ing AI Childhoods

David Udell6 Sep 2022 23:40 UTC
37 points
8 comments4 min readLW link

Deleted com­ments archive

Said Achmiz6 Sep 2022 21:54 UTC
9 points
3 comments1 min readLW link

Guitar Pedals on Fiddle

jefftk6 Sep 2022 19:30 UTC
10 points
0 comments2 min readLW link
(www.jefftk.com)

Re­jected Early Drafts of New­comb’s Problem

zahmahkibo6 Sep 2022 19:04 UTC
112 points
5 comments3 min readLW link

[Question] How can we se­cure more re­search po­si­tions at our uni­ver­si­ties for x-risk re­searchers?

Neil Crawford6 Sep 2022 17:17 UTC
11 points
0 comments1 min readLW link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil Crawford6 Sep 2022 17:17 UTC
6 points
0 comments4 min readLW link

How Josiah be­came an AI safety researcher

Neil Crawford6 Sep 2022 17:17 UTC
4 points
0 comments1 min readLW link

No, hu­man brains are not (much) more effi­cient than computers

Jesse Hoogland6 Sep 2022 13:53 UTC
20 points
21 comments3 min readLW link
(www.jessehoogland.com)

On oxy­tocin-sen­si­tive neu­rons in au­di­tory cortex

Steven Byrnes6 Sep 2022 12:54 UTC
32 points
6 comments12 min readLW link

EA & LW Fo­rums Weekly Sum­mary (28 Aug − 3 Sep 22’)

Zoe Williams6 Sep 2022 11:06 UTC
51 points
2 comments14 min readLW link

Alex Lawsen On Fore­cast­ing AI Progress

Michaël Trazzi6 Sep 2022 9:32 UTC
18 points
0 comments2 min readLW link
(theinsideview.ai)

What are you for?

lsusr6 Sep 2022 3:32 UTC
42 points
5 comments1 min readLW link

The Power (and limits?) of Chunking

NicholasKross6 Sep 2022 2:26 UTC
8 points
2 comments1 min readLW link

Another Un­phrased B-part

jefftk6 Sep 2022 1:30 UTC
10 points
0 comments2 min readLW link
(www.jefftk.com)

[Ex­plo­ra­tory] Be­com­ing more Agentic

Johannes C. Mayer6 Sep 2022 0:45 UTC
6 points
1 comment1 min readLW link

AI Gover­nance Needs Tech­ni­cal Work

Mau5 Sep 2022 22:28 UTC
41 points
1 comment8 min readLW link

pro­gram searches

Tamsin Leake5 Sep 2022 20:04 UTC
21 points
2 comments2 min readLW link
(carado.moe)

Over­ton Gym­nas­tics: An Ex­er­cise in Discomfort

5 Sep 2022 19:20 UTC
40 points
15 comments4 min readLW link

The Good King

GregorDeVillain5 Sep 2022 19:17 UTC
−6 points
0 comments13 min readLW link

Beta Read­ers are Great

HoldenKarnofsky5 Sep 2022 19:10 UTC
28 points
0 comments1 min readLW link
(www.cold-takes.com)

Im­pact Shares For Spec­u­la­tive Projects

Elizabeth5 Sep 2022 18:00 UTC
30 points
8 comments7 min readLW link
(acesounderglass.com)