Un­der­speci­fied Prob­a­bil­ities: A Thought Ex­per­i­ment

lunatic_at_large4 Oct 2023 22:25 UTC
8 points
4 comments2 min readLW link

Fra­ter­nal Birth Order Effect and the Ma­ter­nal Im­mune Hypothesis

Bucky4 Oct 2023 21:18 UTC
19 points
0 comments2 min readLW link

How to solve de­cep­tion and still fail.

Charlie Steiner4 Oct 2023 19:56 UTC
36 points
7 comments6 min readLW link

PortAu­dio M1 Latency

jefftk4 Oct 2023 19:10 UTC
8 points
5 comments1 min readLW link
(www.jefftk.com)

Open Philan­thropy is hiring for mul­ti­ple roles across our Global Catas­trophic Risks teams

aarongertler4 Oct 2023 18:04 UTC
6 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Safe­guard­ing Hu­man­ity: En­sur­ing AI Re­mains a Ser­vant, Not a Master

kgldeshapriya4 Oct 2023 17:52 UTC
−20 points
2 comments2 min readLW link

The 5 Pillars of Happiness

Gabi QUENE4 Oct 2023 17:50 UTC
−24 points
5 comments5 min readLW link

[Question] Us­ing Re­in­force­ment Learn­ing to try to con­trol the heat­ing of a build­ing (dis­trict heat­ing)

Tony Karlsson4 Oct 2023 17:47 UTC
3 points
5 comments1 min readLW link

ra­tio­nal­is­tic prob­a­bil­ity(lit­ter­ally just throw­ing shit out there)

NotaSprayer ASprayer4 Oct 2023 17:46 UTC
−30 points
8 comments2 min readLW link

AISN #23: New OpenAI Models, News from An­thropic, and Rep­re­sen­ta­tion Engineering

4 Oct 2023 17:37 UTC
15 points
2 comments5 min readLW link
(newsletter.safe.ai)

I don’t find the lie de­tec­tion re­sults that sur­pris­ing (by an au­thor of the pa­per)

JanB4 Oct 2023 17:10 UTC
97 points
8 comments3 min readLW link

[Question] What ev­i­dence is there of LLM’s con­tain­ing world mod­els?

Chris_Leong4 Oct 2023 14:33 UTC
17 points
17 comments1 min readLW link

En­tan­gle­ment and in­tu­ition about words and mean­ing

Bill Benzon4 Oct 2023 14:16 UTC
4 points
0 comments2 min readLW link

Why a Mars colony would lead to a first strike situation

Remmelt4 Oct 2023 11:29 UTC
−57 points
8 comments1 min readLW link
(mflb.com)

[Question] What are some ex­am­ples of AIs in­stan­ti­at­ing the ‘near­est un­blocked strat­egy prob­lem’?

EJT4 Oct 2023 11:05 UTC
6 points
4 comments1 min readLW link

Graph­i­cal ten­sor no­ta­tion for interpretability

Jordan Taylor4 Oct 2023 8:04 UTC
129 points
11 comments19 min readLW link

[Link] Bay Area Win­ter Sols­tice 2023

4 Oct 2023 2:19 UTC
18 points
3 comments1 min readLW link
(fb.me)

[Question] Who de­ter­mines whether an al­ign­ment pro­posal is the defini­tive al­ign­ment solu­tion?

MiguelDev3 Oct 2023 22:39 UTC
−1 points
6 comments1 min readLW link

AXRP Epi­sode 25 - Co­op­er­a­tive AI with Cas­par Oesterheld

DanielFilan3 Oct 2023 21:50 UTC
43 points
0 comments92 min readLW link

When to Get the Booster?

jefftk3 Oct 2023 21:00 UTC
50 points
15 comments2 min readLW link
(www.jefftk.com)

OpenAI-Microsoft partnership

Zach Stein-Perlman3 Oct 2023 20:01 UTC
51 points
18 comments1 min readLW link

[Question] Cur­rent AI safety tech­niques?

Zach Stein-Perlman3 Oct 2023 19:30 UTC
30 points
2 comments2 min readLW link

Test­ing and Au­toma­tion for In­tel­li­gent Sys­tems.

Sai Kiran Kammari3 Oct 2023 17:51 UTC
−13 points
0 comments1 min readLW link
(resource-cms.springernature.com)

Me­tac­u­lus An­nounces Fore­cast­ing Tour­na­ment to Eval­u­ate Fo­cused Re­search Or­ga­ni­za­tions, in Part­ner­ship With the Fed­er­a­tion of Amer­i­can Scien­tists

ChristianWilliams3 Oct 2023 16:44 UTC
13 points
0 comments1 min readLW link
(www.metaculus.com)

What would it mean to un­der­stand how a large lan­guage model (LLM) works? Some quick notes.

Bill Benzon3 Oct 2023 15:11 UTC
20 points
4 comments8 min readLW link

[Question] Po­ten­tial al­ign­ment tar­gets for a sovereign su­per­in­tel­li­gent AI

Paul Colognese3 Oct 2023 15:09 UTC
29 points
4 comments1 min readLW link

Monthly Roundup #11: Oc­to­ber 2023

Zvi3 Oct 2023 14:10 UTC
42 points
12 comments35 min readLW link
(thezvi.wordpress.com)

Why We Use Money? - A Walrasian View

Savio Coelho3 Oct 2023 12:02 UTC
4 points
3 comments8 min readLW link

Mech In­terp Challenge: Oc­to­ber—De­ci­pher­ing the Sorted List Model

CallumMcDougall3 Oct 2023 10:57 UTC
23 points
0 comments3 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

3 Oct 2023 7:45 UTC
11 points
0 comments5 min readLW link

Some Quick Fol­low-Up Ex­per­i­ments to “Taken out of con­text: On mea­sur­ing situ­a­tional aware­ness in LLMs”

miles3 Oct 2023 2:22 UTC
31 points
0 comments9 min readLW link

My Mid-Ca­reer Tran­si­tion into Biosecurity

jefftk2 Oct 2023 21:20 UTC
26 points
4 comments2 min readLW link
(www.jefftk.com)

Dall-E 3

p.b.2 Oct 2023 20:33 UTC
37 points
9 comments1 min readLW link
(openai.com)

Thomas Kwa’s MIRI re­search experience

2 Oct 2023 16:42 UTC
169 points
52 comments1 min readLW link

Pop­u­la­tion After a Catastrophe

Stan Pinsent2 Oct 2023 16:06 UTC
3 points
5 comments14 min readLW link

Ex­pec­ta­tions for Gem­ini: hope­fully not a big deal

Maxime Riché2 Oct 2023 15:38 UTC
15 points
5 comments1 min readLW link

A coun­terex­am­ple for mea­surable fac­tor spaces

Matthias G. Mayer2 Oct 2023 15:16 UTC
14 points
0 comments3 min readLW link

Will early trans­for­ma­tive AIs pri­mar­ily use text? [Man­i­fold ques­tion]

Fabien Roger2 Oct 2023 15:05 UTC
16 points
0 comments3 min readLW link

en­ergy land­scapes of experts

bhauth2 Oct 2023 14:08 UTC
41 points
2 comments3 min readLW link
(www.bhauth.com)

Direc­tion of Fit

NicholasKees2 Oct 2023 12:34 UTC
32 points
0 comments3 min readLW link

The 99% prin­ci­ple for per­sonal problems

Kaj_Sotala2 Oct 2023 8:20 UTC
127 points
20 comments2 min readLW link
(kajsotala.fi)

Linkpost: They Stud­ied Dishon­esty. Was Their Work a Lie?

Linch2 Oct 2023 8:10 UTC
91 points
12 comments2 min readLW link
(www.newyorker.com)

A Math­e­mat­i­cal Model for Simulators

lukemarks2 Oct 2023 6:46 UTC
11 points
0 comments2 min readLW link

Why I got the smal­l­pox vac­cine in 2023

joec2 Oct 2023 5:11 UTC
22 points
6 comments4 min readLW link

In­stru­men­tal Con­ver­gence and hu­man ex­tinc­tion.

Spiritus Dei2 Oct 2023 0:41 UTC
−10 points
3 comments7 min readLW link

Re­vis­it­ing the Man­i­fold Hypothesis

Aidan Rocke1 Oct 2023 23:55 UTC
10 points
19 comments4 min readLW link

AI Align­ment Break­throughs this Week [new sub­stack]

Logan Zoellner1 Oct 2023 22:13 UTC
0 points
8 comments2 min readLW link

[Question] Look­ing for study

Robert Feinstein1 Oct 2023 19:52 UTC
4 points
0 comments1 min readLW link

Join AISafety.info’s Distil­la­tion Hackathon (Oct 6-9th)

smallsilo1 Oct 2023 18:43 UTC
21 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Fifty Flips

abstractapplic1 Oct 2023 15:30 UTC
32 points
14 comments1 min readLW link
(h-b-p.github.io)