AI Align­ment Break­throughs this week (10/​08/​23)

Logan ZoellnerOct 8, 2023, 11:30 PM
30 points
14 comments6 min readLW link

“The Heart of Gam­ing is the Power Fan­tasy”, and Co­hab­itive Games

RaemonOct 8, 2023, 9:02 PM
81 points
49 comments4 min readLW link
(bottomfeeder.substack.com)

FAQ: What the heck is goal ag­nos­ti­cism?

porbyOct 8, 2023, 7:11 PM
66 points
38 comments28 min readLW link

Time is ho­mo­ge­neous se­quen­tially-com­pos­able determination

TsviBTOct 8, 2023, 2:58 PM
15 points
0 comments21 min readLW link

Linkpost: Are Emer­gent Abil­ities in Large Lan­guage Models just In-Con­text Learn­ing?

Erich_GrunewaldOct 8, 2023, 12:14 PM
12 points
7 comments2 min readLW link
(arxiv.org)

Bird-eye view vi­su­al­iza­tion of LLM activations

SergiiOct 8, 2023, 12:12 PM
11 points
2 comments1 min readLW link
(grgv.xyz)

Per­spec­tive Based Rea­son­ing Could Ab­solve CDT

dadadarrenOct 8, 2023, 11:22 AM
4 points
5 comments5 min readLW link

The Gra­di­ent – The Ar­tifi­cial­ity of Alignment

micOct 8, 2023, 4:06 AM
12 points
1 comment5 min readLW link
(thegradient.pub)

Com­par­ing An­thropic’s Dic­tionary Learn­ing to Ours

Robert_AIZIOct 7, 2023, 11:30 PM
137 points
8 comments4 min readLW link

A thought about the con­straints of debtless­ness in on­line communities

mako yassOct 7, 2023, 9:26 PM
58 points
23 comments1 min readLW link

Ar­gu­ments for util­i­tar­i­anism are im­pos­si­bil­ity ar­gu­ments un­der un­bounded prospects

MichaelStJulesOct 7, 2023, 9:08 PM
7 points
7 comments21 min readLW link

Sam Alt­man’s sister claims Sam sex­u­ally abused her—Part 1: In­tro­duc­tion, out­line, au­thor’s notes

pythagoras5015Oct 7, 2023, 9:06 PM
95 points
108 comments8 min readLW link

Griffin Island

jefftkOct 7, 2023, 6:40 PM
14 points
3 comments1 min readLW link
(www.jefftk.com)

Every Men­tion of EA in “Go­ing In­finite”

KirstenHOct 7, 2023, 2:42 PM
48 points
0 comments8 min readLW link
(open.substack.com)

Fix­ing In­sider Threats in the AI Sup­ply Chain

Madhav MalhotraOct 7, 2023, 1:19 PM
20 points
2 comments5 min readLW link

Con­tra Nora Belrose on Orthog­o­nal­ity Th­e­sis Be­ing Trivial

tailcalledOct 7, 2023, 11:47 AM
18 points
21 comments1 min readLW link

Re­lated Dis­cus­sion from Thomas Kwa’s MIRI Re­search Experience

RaemonOct 7, 2023, 6:25 AM
71 points
140 comments1 min readLW link

[Question] Cur­rent State of Prob­a­bil­is­tic Logic

lunatic_at_largeOct 7, 2023, 5:06 AM
3 points
2 comments1 min readLW link

On the Re­la­tion­ship Between Vari­abil­ity and the Evolu­tion­ary Out­comes of Sys­tems in Nature

Artyom ShaposhnikovOct 7, 2023, 3:06 AM
2 points
0 comments1 min readLW link

An­nounc­ing Dialogues

Ben PaceOct 7, 2023, 2:57 AM
155 points
59 comments4 min readLW link

Don’t Dis­miss Sim­ple Align­ment Approaches

Chris_LeongOct 7, 2023, 12:35 AM
137 points
9 comments4 min readLW link

Link­ing Alt Accounts

jefftkOct 6, 2023, 5:00 PM
70 points
33 comments1 min readLW link
(www.jefftk.com)

Su­per-Ex­po­nen­tial ver­sus Ex­po­nen­tial Growth in Com­pute Price-Performance

moridinamaelOct 6, 2023, 4:23 PM
37 points
25 comments2 min readLW link

A per­sonal ex­pla­na­tion of ELK con­cept and task.

Zeyu QinOct 6, 2023, 3:55 AM
1 point
0 comments1 min readLW link

The Long-Term Fu­ture Fund is look­ing for a full-time fund chair

Oct 5, 2023, 10:18 PM
52 points
0 comments7 min readLW link
(forum.effectivealtruism.org)

Prov­ably Safe AI

PeterMcCluskeyOct 5, 2023, 10:18 PM
35 points
15 comments4 min readLW link
(bayesianinvestor.com)

Stampy’s AI Safety Info soft launch

Oct 5, 2023, 10:13 PM
120 points
9 comments2 min readLW link

Im­pacts of AI on the hous­ing markets

PottedRosePetalOct 5, 2023, 9:24 PM
8 points
0 comments5 min readLW link

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-DoddsOct 5, 2023, 9:01 PM
288 points
22 comments2 min readLW link1 review
(transformer-circuits.pub)

Ideation and Tra­jec­tory Model­ling in Lan­guage Models

NickyPOct 5, 2023, 7:21 PM
16 points
2 comments10 min readLW link

A well-defined his­tory in mea­surable fac­tor spaces

Matthias G. MayerOct 5, 2023, 6:36 PM
22 points
0 comments2 min readLW link

Eval­u­at­ing the his­tor­i­cal value mis­speci­fi­ca­tion argument

Matthew BarnettOct 5, 2023, 6:34 PM
188 points
162 comments7 min readLW link3 reviews

Trans­la­tions Should Invert

abramdemskiOct 5, 2023, 5:44 PM
48 points
19 comments3 min readLW link

Cen­sor­ship in LLMs is here to stay be­cause it mir­rors how our own in­tel­li­gence is structured

mnvrOct 5, 2023, 5:37 PM
3 points
0 comments1 min readLW link

Twin Cities ACX Meetup Oc­to­ber 2023

Timothy M.Oct 5, 2023, 4:29 PM
1 point
2 comments1 min readLW link

This anime sto­ry­board doesn’t ex­ist: a graphic novel writ­ten and illus­trated by GPT4

RomanSOct 5, 2023, 2:01 PM
12 points
7 comments55 min readLW link

AI #32: Lie Detector

ZviOct 5, 2023, 1:50 PM
45 points
19 comments44 min readLW link
(thezvi.wordpress.com)

Can the House Leg­is­late?

jefftkOct 5, 2023, 1:40 PM
26 points
6 comments2 min readLW link
(www.jefftk.com)

Mak­ing progress on the ``what al­ign­ment tar­get should be aimed at?″ ques­tion, is urgent

ThomasCederborgOct 5, 2023, 12:55 PM
2 points
0 comments18 min readLW link

Re­sponse to Quintin Pope’s Evolu­tion Pro­vides No Ev­i­dence For the Sharp Left Turn

ZviOct 5, 2023, 11:39 AM
129 points
29 comments9 min readLW link

How to Get Ra­tion­al­ist Feedback

Nicholas / Heather KrossOct 5, 2023, 2:03 AM
16 points
0 comments2 min readLW link

On my AI Fable, and the im­por­tance of de re, de dicto, and de se refer­ence for AI alignment

PhilGoetzOct 5, 2023, 12:50 AM
9 points
5 comments1 min readLW link

Un­der­speci­fied Prob­a­bil­ities: A Thought Ex­per­i­ment

lunatic_at_largeOct 4, 2023, 10:25 PM
8 points
4 comments2 min readLW link

Fra­ter­nal Birth Order Effect and the Ma­ter­nal Im­mune Hypothesis

BuckyOct 4, 2023, 9:18 PM
20 points
1 comment2 min readLW link

How to solve de­cep­tion and still fail.

Charlie SteinerOct 4, 2023, 7:56 PM
40 points
7 comments6 min readLW link

PortAu­dio M1 Latency

jefftkOct 4, 2023, 7:10 PM
8 points
5 comments1 min readLW link
(www.jefftk.com)

Open Philan­thropy is hiring for mul­ti­ple roles across our Global Catas­trophic Risks teams

aarongertlerOct 4, 2023, 6:04 PM
6 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Safe­guard­ing Hu­man­ity: En­sur­ing AI Re­mains a Ser­vant, Not a Master

kgldeshapriyaOct 4, 2023, 5:52 PM
−20 points
2 comments2 min readLW link

The 5 Pillars of Happiness

Gabi QUENEOct 4, 2023, 5:50 PM
−24 points
5 comments5 min readLW link

[Question] Us­ing Re­in­force­ment Learn­ing to try to con­trol the heat­ing of a build­ing (dis­trict heat­ing)

Tony KarlssonOct 4, 2023, 5:47 PM
3 points
5 comments1 min readLW link