RSS

Thomas Kwa’s MIRI re­search experience

2 Oct 2023 16:42 UTC
142 points
39 comments1 min readLW link

When to Get the Booster?

jefftk3 Oct 2023 21:00 UTC
32 points
2 comments2 min readLW link
(www.jefftk.com)

AXRP Epi­sode 25 - Co­op­er­a­tive AI with Cas­par Oesterheld

DanielFilan3 Oct 2023 21:50 UTC
27 points
0 comments92 min readLW link

The 99% prin­ci­ple for per­sonal problems

Kaj_Sotala2 Oct 2023 8:20 UTC
101 points
6 comments2 min readLW link
(kajsotala.fi)

Linkpost: They Stud­ied Dishon­esty. Was Their Work a Lie?

Linch2 Oct 2023 8:10 UTC
88 points
7 comments2 min readLW link
(www.newyorker.com)

[Question] Po­ten­tial al­ign­ment tar­gets for a sovereign su­per­in­tel­li­gent AI

Paul Colognese3 Oct 2023 15:09 UTC
26 points
4 comments1 min readLW link

[Question] Cur­rent AI safety tech­niques?

Zach Stein-Perlman3 Oct 2023 19:30 UTC
13 points
1 comment2 min readLW link

What would it mean to un­der­stand how a large lan­guage model (LLM) works? Some quick notes.

Bill Benzon3 Oct 2023 15:11 UTC
18 points
2 comments8 min readLW link

Dall-E 3

p.b.2 Oct 2023 20:33 UTC
35 points
4 comments1 min readLW link
(openai.com)

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

28 Sep 2023 18:53 UTC
162 points
28 comments3 min readLW link

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotal29 Sep 2023 14:01 UTC
135 points
49 comments1 min readLW link
(titotal.substack.com)

My Effortless Weight­loss Story: A Quick Runthrough

CuoreDiVetro30 Sep 2023 23:02 UTC
89 points
36 comments9 min readLW link

Some Quick Fol­low-Up Ex­per­i­ments to “Taken out of con­text: On mea­sur­ing situ­a­tional aware­ness in LLMs”

miles3 Oct 2023 2:22 UTC
24 points
0 comments9 min readLW link

en­ergy land­scapes of experts

bhauth2 Oct 2023 14:08 UTC
37 points
2 comments3 min readLW link
(www.bhauth.com)

In­side Views, Im­pos­tor Syn­drome, and the Great LARP

johnswentworth25 Sep 2023 16:08 UTC
239 points
40 comments5 min readLW link

My Mid-Ca­reer Tran­si­tion into Biosecurity

jefftk2 Oct 2023 21:20 UTC
25 points
4 comments2 min readLW link
(www.jefftk.com)

The Talk: a brief ex­pla­na­tion of sex­ual dimorphism

Malmesbury18 Sep 2023 16:23 UTC
423 points
69 comments16 min readLW link

Mech In­terp Challenge: Oc­to­ber—De­ci­pher­ing the Sorted List Model

TheMcDouglas3 Oct 2023 10:57 UTC
10 points
0 comments3 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

3 Oct 2023 7:45 UTC
11 points
0 comments5 min readLW link

Ar­gu­ments for moral indefinability

Richard_Ngo30 Sep 2023 22:40 UTC
47 points
15 comments7 min readLW link
(www.thinkingcomplete.com)