[Question] Who de­ter­mines whether an al­ign­ment pro­posal is the defini­tive al­ign­ment solu­tion?

MiguelDevOct 3, 2023, 10:39 PM
−1 points
6 comments1 min readLW link

AXRP Epi­sode 25 - Co­op­er­a­tive AI with Cas­par Oesterheld

DanielFilanOct 3, 2023, 9:50 PM
43 points
0 comments92 min readLW link

When to Get the Booster?

jefftkOct 3, 2023, 9:00 PM
50 points
15 comments2 min readLW link
(www.jefftk.com)

OpenAI-Microsoft partnership

Zach Stein-PerlmanOct 3, 2023, 8:01 PM
51 points
19 comments1 min readLW link

[Question] Cur­rent AI safety tech­niques?

Zach Stein-PerlmanOct 3, 2023, 7:30 PM
30 points
2 comments2 min readLW link

Test­ing and Au­toma­tion for In­tel­li­gent Sys­tems.

Sai Kiran KammariOct 3, 2023, 5:51 PM
−13 points
0 comments1 min readLW link
(resource-cms.springernature.com)

Me­tac­u­lus An­nounces Fore­cast­ing Tour­na­ment to Eval­u­ate Fo­cused Re­search Or­ga­ni­za­tions, in Part­ner­ship With the Fed­er­a­tion of Amer­i­can Scien­tists

ChristianWilliamsOct 3, 2023, 4:44 PM
13 points
0 commentsLW link
(www.metaculus.com)

What would it mean to un­der­stand how a large lan­guage model (LLM) works? Some quick notes.

Bill BenzonOct 3, 2023, 3:11 PM
20 points
4 comments8 min readLW link

[Question] Po­ten­tial al­ign­ment tar­gets for a sovereign su­per­in­tel­li­gent AI

Paul CologneseOct 3, 2023, 3:09 PM
29 points
4 comments1 min readLW link

Monthly Roundup #11: Oc­to­ber 2023

ZviOct 3, 2023, 2:10 PM
42 points
12 comments35 min readLW link
(thezvi.wordpress.com)

Why We Use Money? - A Walrasian View

Savio CoelhoOct 3, 2023, 12:02 PM
4 points
3 comments8 min readLW link

Mech In­terp Challenge: Oc­to­ber—De­ci­pher­ing the Sorted List Model

CallumMcDougallOct 3, 2023, 10:57 AM
23 points
0 comments3 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

Oct 3, 2023, 7:45 AM
17 points
0 comments5 min readLW link

Some Quick Fol­low-Up Ex­per­i­ments to “Taken out of con­text: On mea­sur­ing situ­a­tional aware­ness in LLMs”

Miles TurpinOct 3, 2023, 2:22 AM
31 points
0 comments9 min readLW link

My Mid-Ca­reer Tran­si­tion into Biosecurity

jefftkOct 2, 2023, 9:20 PM
26 points
4 comments2 min readLW link
(www.jefftk.com)

Dall-E 3

p.b.Oct 2, 2023, 8:33 PM
37 points
9 comments1 min readLW link
(openai.com)

Thomas Kwa’s MIRI re­search experience

Oct 2, 2023, 4:42 PM
173 points
53 comments1 min readLW link

Pop­u­la­tion After a Catastrophe

Stan PinsentOct 2, 2023, 4:06 PM
3 points
5 comments14 min readLW link

Ex­pec­ta­tions for Gem­ini: hope­fully not a big deal

Maxime RichéOct 2, 2023, 3:38 PM
15 points
5 comments1 min readLW link

A coun­terex­am­ple for mea­surable fac­tor spaces

Matthias G. MayerOct 2, 2023, 3:16 PM
14 points
0 comments3 min readLW link

Will early trans­for­ma­tive AIs pri­mar­ily use text? [Man­i­fold ques­tion]

Fabien RogerOct 2, 2023, 3:05 PM
16 points
0 comments3 min readLW link

en­ergy land­scapes of experts

bhauthOct 2, 2023, 2:08 PM
45 points
2 comments3 min readLW link
(www.bhauth.com)

Direc­tion of Fit

NicholasKeesOct 2, 2023, 12:34 PM
34 points
0 comments3 min readLW link

The 99% prin­ci­ple for per­sonal problems

Kaj_SotalaOct 2, 2023, 8:20 AM
141 points
20 comments2 min readLW link
(kajsotala.fi)

Linkpost: They Stud­ied Dishon­esty. Was Their Work a Lie?

LinchOct 2, 2023, 8:10 AM
91 points
12 comments2 min readLW link
(www.newyorker.com)

Why I got the smal­l­pox vac­cine in 2023

joecOct 2, 2023, 5:11 AM
25 points
6 comments4 min readLW link

In­stru­men­tal Con­ver­gence and hu­man ex­tinc­tion.

Spiritus DeiOct 2, 2023, 12:41 AM
−10 points
3 comments7 min readLW link

Re­vis­it­ing the Man­i­fold Hypothesis

Aidan RockeOct 1, 2023, 11:55 PM
13 points
19 comments4 min readLW link

AI Align­ment Break­throughs this Week [new sub­stack]

Logan ZoellnerOct 1, 2023, 10:13 PM
0 points
8 comments2 min readLW link

[Question] Look­ing for study

Robert FeinsteinOct 1, 2023, 7:52 PM
4 points
0 comments1 min readLW link

Join AISafety.info’s Distil­la­tion Hackathon (Oct 6-9th)

smallsiloOct 1, 2023, 6:43 PM
21 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Fifty Flips

abstractapplicOct 1, 2023, 3:30 PM
33 points
15 comments1 min readLW link1 review
(h-b-p.github.io)

AI Safety Im­pact Mar­kets: Your Char­ity Eval­u­a­tor for AI Safety

Dawn DrescherOct 1, 2023, 10:47 AM
16 points
5 commentsLW link
(impactmarkets.substack.com)

“Ab­sence of Ev­i­dence is Not Ev­i­dence of Ab­sence” As a Limit

transhumanist_atom_understanderOct 1, 2023, 8:15 AM
16 points
1 comment2 min readLW link

New Tool: the Resi­d­ual Stream Viewer

AdamYedidiaOct 1, 2023, 12:49 AM
32 points
7 comments4 min readLW link
(tinyurl.com)

My Effortless Weight­loss Story: A Quick Runthrough

CuoreDiVetroSep 30, 2023, 11:02 PM
123 points
78 comments9 min readLW link

Ar­gu­ments for moral indefinability

Richard_NgoSep 30, 2023, 10:40 PM
47 points
16 comments7 min readLW link
(www.thinkingcomplete.com)

Con­di­tion­als All The Way Down

lunatic_at_largeSep 30, 2023, 9:06 PM
33 points
2 comments3 min readLW link

Fo­cus­ing your im­pact on short vs long TAI timelines

kuhanjSep 30, 2023, 7:34 PM
4 points
0 comments10 min readLW link

How model edit­ing could help with the al­ign­ment prob­lem

Michael RipaSep 30, 2023, 5:47 PM
12 points
1 comment15 min readLW link

My sub­mis­sion to the ALTER Prize

LorxusSep 30, 2023, 4:07 PM
6 points
0 comments1 min readLW link
(www.docdroid.net)

Anki deck for learn­ing the main AI safety orgs, pro­jects, and programs

Bryce RobertsonSep 30, 2023, 4:06 PM
2 points
0 comments1 min readLW link

The Lighthaven Cam­pus is open for bookings

habrykaSep 30, 2023, 1:08 AM
209 points
18 comments4 min readLW link
(www.lighthaven.space)

Head­phones hook

philhSep 29, 2023, 10:50 PM
21 points
1 comment3 min readLW link
(reasonableapproximation.net)

Paul Chris­ti­ano’s views on “doom” (video ex­plainer)

Michaël TrazziSep 29, 2023, 9:56 PM
15 points
0 comments1 min readLW link
(youtu.be)

The Retroac­tive Fund­ing Land­scape: In­no­va­tions for Donors and Grantmakers

Dawn DrescherSep 29, 2023, 5:39 PM
13 points
0 commentsLW link
(impactmarkets.substack.com)

Bids To Defer On Value Judgements

johnswentworthSep 29, 2023, 5:07 PM
58 points
6 comments3 min readLW link

An­nounc­ing FAR Labs, an AI safety cowork­ing space

Ben GoldhaberSep 29, 2023, 4:52 PM
95 points
0 comments1 min readLW link

A tool for search­ing ra­tio­nal­ist & EA webs

Daniel_FriedrichSep 29, 2023, 3:23 PM
4 points
0 comments1 min readLW link
(ratsearch.blogspot.com)

Ba­sic Math­e­mat­ics of Pre­dic­tive Coding

Adam ShaiSep 29, 2023, 2:38 PM
49 points
6 comments9 min readLW link