Start­ing in mechanis­tic interpretability

Jakub Smékal22 Jan 2024 23:40 UTC
1 point
0 comments3 min readLW link
(jakubsmekal.com)

We need a Science of Evals

22 Jan 2024 20:30 UTC
66 points
13 comments9 min readLW link

An­nounc­ing the SoS Re­search Col­lec­tive for in­de­pen­dent re­searchers (and aca­demics think­ing in­de­pen­dently)

rogersbacon22 Jan 2024 20:13 UTC
15 points
0 comments8 min readLW link
(www.theseedsofscience.pub)

A Brief Assess­ment of OpenAI’s Pre­pared­ness Frame­work & Some Sugges­tions for Improvement

simeon_c22 Jan 2024 20:08 UTC
14 points
0 comments6 min readLW link
(uploads-ssl.webflow.com)

D&D.Sci(-fi): Coloniz­ing the Su­perHyper­Sphere [Eval­u­a­tion and Rule­set]

abstractapplic22 Jan 2024 19:20 UTC
38 points
7 comments3 min readLW link

′ pe­ter­todd’’s last stand: The fi­nal days of open GPT-3 research

mwatkins22 Jan 2024 18:47 UTC
108 points
16 comments45 min readLW link

In­terLab – a toolkit for ex­per­i­ments with multi-agent interactions

22 Jan 2024 18:23 UTC
69 points
0 comments8 min readLW link
(acsresearch.org)

San Fer­nando Valley Ra­tion­al­ist Meetup

Thomas Broadley22 Jan 2024 16:49 UTC
3 points
1 comment1 min readLW link

Who Or­ga­nizes Dances?

jefftk22 Jan 2024 14:30 UTC
12 points
0 comments1 min readLW link
(www.jefftk.com)

Values Darwinism

pchvykov22 Jan 2024 10:44 UTC
11 points
13 comments3 min readLW link

[Question] The akra­sia doom loop and ex­ec­u­tive func­tion di­s­or­ders: a question

TeaTieAndHat22 Jan 2024 7:01 UTC
16 points
7 comments2 min readLW link

Pre­dict­ing AGI by the Tur­ing Test

Yuxi_Liu22 Jan 2024 4:22 UTC
21 points
2 comments10 min readLW link
(yuxi-liu-wired.github.io)

In­cor­po­rat­ing Jus­tice The­ory into De­ci­sion Theory

StrivingForLegibility21 Jan 2024 19:17 UTC
13 points
20 comments5 min readLW link

De­liber­ate Dy­sen­tery: Q&A about Hu­man Challenge Trials

Niko_McCarty21 Jan 2024 19:05 UTC
16 points
1 comment18 min readLW link
(www.asimov.press)

When Does Altru­ism Strengthen Altru­ism?

jefftk21 Jan 2024 18:50 UTC
44 points
2 comments3 min readLW link
(www.jefftk.com)

A Shut­down Prob­lem Proposal

21 Jan 2024 18:12 UTC
122 points
61 comments6 min readLW link

Is prin­ci­pled mass-out­reach pos­si­ble, for AGI X-risk?

NicholasKross21 Jan 2024 17:45 UTC
9 points
5 comments3 min readLW link

Vacuum: The­ory and Technologies

ethanmorse21 Jan 2024 17:23 UTC
33 points
0 comments25 min readLW link
(210ethan.github.io)

Another Non-An­thropic Para­dox: The Un­sur­pris­ing Rare­ness of Rare Events

Ape in the coat21 Jan 2024 15:58 UTC
16 points
16 comments6 min readLW link

Book re­view: Cui­sine and Empire

eukaryote21 Jan 2024 6:15 UTC
40 points
2 comments12 min readLW link
(eukaryotewritesblog.com)

Cat­a­logue of POLITICO Re­ports and Other Cited Ar­ti­cles on Effec­tive Altru­ism and AI Safety Con­nec­tions in Wash­ing­ton, DC

Evan_Gaensbauer21 Jan 2024 2:15 UTC
4 points
0 comments1 min readLW link
(docs.google.com)

You can rack up mas­sive amounts of data quickly by ask­ing ques­tions to all your friends

Neil 21 Jan 2024 1:27 UTC
14 points
2 comments2 min readLW link

[Question] Party for biomed­i­cal re­ju­ve­na­tion re­search: Euro­pean par­li­a­ment elections

Iakov Dudinsky21 Jan 2024 0:35 UTC
1 point
0 comments1 min readLW link

[Question] Why have in­surance mar­kets suc­ceeded where pre­dic­tion mar­kets have not?

JNank21 Jan 2024 0:35 UTC
13 points
13 comments1 min readLW link

[linkpost] Self-Re­ward­ing Lan­guage Models

Jacob G-W21 Jan 2024 0:30 UTC
13 points
2 comments1 min readLW link
(arxiv.org)

Why Im­prov­ing Dialogue Feels So Hard

matto20 Jan 2024 21:26 UTC
21 points
8 comments3 min readLW link

Re­search Log, RLLMv2: Phi-1.5, GPT2XL and Fal­con-RW-1B as pa­per­clip maximizers

MiguelDev20 Jan 2024 15:30 UTC
6 points
0 comments10 min readLW link

Against the Bur­den of Knowledge

Maxwell Tabarrok20 Jan 2024 14:37 UTC
21 points
6 comments6 min readLW link
(maximumprogress.substack.com)

legged robot scal­ing laws

bhauth20 Jan 2024 5:45 UTC
34 points
8 comments7 min readLW link
(www.bhauth.com)

Leg­i­bil­ity Makes Log­i­cal Line-Of-Sight Transitive

StrivingForLegibility19 Jan 2024 23:39 UTC
12 points
0 comments5 min readLW link

De­cent plan prize win­ner & highlights

lukehmiles19 Jan 2024 23:30 UTC
25 points
2 comments4 min readLW link

A quick in­ves­ti­ga­tion of AI pro-AI bias

Fabien Roger19 Jan 2024 23:26 UTC
52 points
1 comment2 min readLW link

[Question] What Soft­ware Should Ex­ist?

Tomás B.19 Jan 2024 21:43 UTC
27 points
21 comments1 min readLW link

On “Geeks, MOPs, and So­ciopaths”

19 Jan 2024 21:04 UTC
31 points
35 comments8 min readLW link

There is way too much serendipity

Malmesbury19 Jan 2024 19:37 UTC
351 points
56 comments7 min readLW link

Es­ti­mat­ing effi­ciency im­prove­ments in LLM pre-training

Daan19 Jan 2024 19:32 UTC
42 points
3 comments21 min readLW link

Up­date: Ori­ent­ing Our­selves in 2024 | Guild of the ROSE

moridinamael19 Jan 2024 16:48 UTC
14 points
0 comments1 min readLW link
(guildoftherose.org)

I Want XMP But I Know Why I Can’t Have It

jefftk19 Jan 2024 15:30 UTC
23 points
0 comments3 min readLW link
(www.jefftk.com)

Ar­gu­ments for Ro­bust­ness in AI Alignment

Fabian Schimpf19 Jan 2024 10:24 UTC
2 points
1 comment1 min readLW link

[Question] What ra­tio­nal­ity failure modes are there?

Ulisse Mini19 Jan 2024 9:12 UTC
42 points
11 comments1 min readLW link

[Question] What’s up with on­line me­dia and our abil­ity to get sh*t done?

TeaTieAndHat19 Jan 2024 9:12 UTC
2 points
0 comments6 min readLW link

Log­i­cal Line-Of-Sight Makes Games Se­quen­tial or Loopy

StrivingForLegibility19 Jan 2024 4:05 UTC
38 points
0 comments7 min readLW link

[Question] Are there high-qual­ity sur­veys available de­tailing the rates of polyamory among Amer­i­cans age 18-45 in metropoli­tan ar­eas in the United States?

Evan_Gaensbauer18 Jan 2024 23:50 UTC
23 points
0 comments1 min readLW link

Man­i­fund: 2023 in Review

Austin Chen18 Jan 2024 23:50 UTC
32 points
0 comments1 min readLW link
(manifund.substack.com)

The Un­der­re­ac­tion to OpenAI

Sherrinford18 Jan 2024 22:08 UTC
21 points
0 comments6 min readLW link

Against Non­lin­ear (Thing Of Things)

tailcalled18 Jan 2024 21:40 UTC
58 points
18 comments1 min readLW link
(thingofthings.substack.com)

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

18 Jan 2024 21:06 UTC
184 points
17 comments73 min readLW link

The True Story of How GPT-2 Be­came Max­i­mally Lewd

18 Jan 2024 21:03 UTC
70 points
7 comments6 min readLW link
(youtu.be)

Gaia Net­work: An Illus­trated Primer

18 Jan 2024 18:23 UTC
1 point
2 comments15 min readLW link

On the abo­li­tion of man

Joe Carlsmith18 Jan 2024 18:17 UTC
88 points
18 comments41 min readLW link