Ronny and Nate dis­cuss what sorts of minds hu­man­ity is likely to find by Ma­chine Learning

19 Dec 2023 23:39 UTC
35 points
30 comments25 min readLW link

[Question] What are the best Siderea posts?

mike_hawke19 Dec 2023 23:07 UTC
16 points
2 comments1 min readLW link

Mean­ing & Agency

abramdemski19 Dec 2023 22:27 UTC
90 points
17 comments14 min readLW link

s/​acc: Safe Ac­cel­er­a­tionism Manifesto

lorepieri19 Dec 2023 22:19 UTC
−4 points
5 comments2 min readLW link
(lorenzopieri.com)

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC
67 points
11 comments1 min readLW link

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

19 Dec 2023 19:14 UTC
45 points
4 comments6 min readLW link
(arxiv.org)

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

How does a toy 2 digit sub­trac­tion trans­former pre­dict the sign of the out­put?

Evan Anders19 Dec 2023 18:56 UTC
14 points
0 comments8 min readLW link
(evanhanders.blog)

In­cre­men­tal AI Risks from Proxy-Simulations

kmenou19 Dec 2023 18:56 UTC
2 points
0 comments1 min readLW link
(individual.utoronto.ca)

A propo­si­tion for the mod­ifi­ca­tion of our epistemology

JacobBowden19 Dec 2023 18:55 UTC
−4 points
2 comments4 min readLW link

Goal-Com­plete­ness is like Tur­ing-Com­plete­ness for AGI

Liron19 Dec 2023 18:12 UTC
50 points
26 comments3 min readLW link

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman Leventov19 Dec 2023 16:49 UTC
17 points
5 comments3 min readLW link

Chord­ing “The Next Right Thing”

jefftk19 Dec 2023 15:40 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

Monthly Roundup #13: De­cem­ber 2023

Zvi19 Dec 2023 15:10 UTC
32 points
5 comments26 min readLW link
(thezvi.wordpress.com)

Effec­tive Asper­sions: How the Non­lin­ear In­ves­ti­ga­tion Went Wrong

TracingWoodgrains19 Dec 2023 12:00 UTC
168 points
170 comments1 min readLW link

A Univer­sal Emer­gent De­com­po­si­tion of Retrieval Tasks in Lan­guage Models

19 Dec 2023 11:52 UTC
81 points
3 comments10 min readLW link
(arxiv.org)

Assess­ment of AI safety agen­das: think about the down­side risk

Roman Leventov19 Dec 2023 9:00 UTC
13 points
1 comment1 min readLW link

A So­cratic Dialogue about So­cratic Dialogues

19 Dec 2023 7:50 UTC
30 points
0 comments5 min readLW link

Con­stel­la­tions are Younger than Continents

Jeffrey Heninger19 Dec 2023 6:12 UTC
259 points
22 comments2 min readLW link

The Dark Arts

19 Dec 2023 4:41 UTC
128 points
49 comments9 min readLW link

When sci­en­tists con­sider whether their re­search will end the world

Harlan19 Dec 2023 3:47 UTC
29 points
4 comments11 min readLW link
(blog.aiimpacts.org)

Is the far fu­ture in­evitably zero sum?

Srdjan Miletic19 Dec 2023 1:45 UTC
8 points
2 comments2 min readLW link
(dissent.blog)

The ‘Ne­glected Ap­proaches’ Ap­proach: AE Stu­dio’s Align­ment Agenda

18 Dec 2023 20:35 UTC
163 points
20 comments12 min readLW link

The Short­est Path Between Scylla and Charybdis

Thane Ruthenis18 Dec 2023 20:08 UTC
50 points
8 comments5 min readLW link

OpenAI: Pre­pared­ness framework

Zach Stein-Perlman18 Dec 2023 18:30 UTC
68 points
23 comments4 min readLW link
(openai.com)

[Valence se­ries] 5. “Valence Di­sor­ders” in Men­tal Health & Personality

Steven Byrnes18 Dec 2023 15:26 UTC
35 points
7 comments13 min readLW link

Dis­cus­sion: Challenges with Un­su­per­vised LLM Knowl­edge Discovery

18 Dec 2023 11:58 UTC
147 points
21 comments10 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC
30 points
9 comments9 min readLW link

Talk: “AI Would Be A Lot Less Alarm­ing If We Un­der­stood Agents”

johnswentworth17 Dec 2023 23:46 UTC
58 points
3 comments1 min readLW link
(www.youtube.com)

∀: a story

Richard_Ngo17 Dec 2023 22:42 UTC
36 points
1 comment8 min readLW link
(www.narrativeark.xyz)

Re­viv­ing a 2015 MacBook

jefftk17 Dec 2023 21:00 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

A Com­mon-Sense Case For Mu­tu­ally-Misal­igned AGIs Ally­ing Against Humans

Thane Ruthenis17 Dec 2023 20:28 UTC
29 points
7 comments11 min readLW link

OpenAI, Deep­Mind, An­thropic, etc. should shut down.

Tamsin Leake17 Dec 2023 20:01 UTC
36 points
48 comments3 min readLW link
(carado.moe)

The Limits of Ar­tifi­cial Con­scious­ness: A Biol­ogy-Based Cri­tique of Chalmers’ Fad­ing Qualia Argument

Štěpán Los17 Dec 2023 19:11 UTC
−6 points
9 comments17 min readLW link

What makes teach­ing math special

Viliam17 Dec 2023 14:15 UTC
38 points
27 comments11 min readLW link

The pre­dic­tive power of dis­si­pa­tive adaptation

dr_s17 Dec 2023 14:01 UTC
44 points
12 comments19 min readLW link

Linkpost: Francesca v Harvard

Linch17 Dec 2023 6:18 UTC
5 points
5 comments2 min readLW link
(www.francesca-v-harvard.org)

Les­sons from mas­sag­ing my­self, oth­ers, dogs, and cats

Chipmonk17 Dec 2023 4:28 UTC
0 points
27 comments5 min readLW link
(chipmonk.blog)

The Serendipity of Density

jefftk17 Dec 2023 3:50 UTC
39 points
4 comments1 min readLW link
(www.jefftk.com)

Bounty: Di­verse hard tasks for LLM agents

17 Dec 2023 1:04 UTC
49 points
31 comments16 min readLW link

2022 (and All Time) Posts by Ping­back Count

Raemon16 Dec 2023 21:17 UTC
51 points
14 comments6 min readLW link

“Hu­man­ity vs. AGI” Will Never Look Like “Hu­man­ity vs. AGI” to Humanity

Thane Ruthenis16 Dec 2023 20:08 UTC
170 points
23 comments5 min readLW link

Align­ment work in anoma­lous worlds

Tamsin Leake16 Dec 2023 19:34 UTC
24 points
4 comments3 min readLW link
(carado.moe)

A vi­sual anal­ogy for text gen­er­a­tion by LLMs?

Bill Benzon16 Dec 2023 17:58 UTC
3 points
0 comments1 min readLW link

Up­grad­ing the AI Safety Community

16 Dec 2023 15:34 UTC
41 points
9 comments42 min readLW link

cold alu­minum for medicine

bhauth16 Dec 2023 14:38 UTC
42 points
4 comments4 min readLW link
(www.bhauth.com)

Scal­able Over­sight and Weak-to-Strong Gen­er­al­iza­tion: Com­pat­i­ble ap­proaches to the same problem

16 Dec 2023 5:49 UTC
73 points
3 comments6 min readLW link

Weak-to-Strong Gen­er­al­iza­tion: Elic­it­ing Strong Ca­pa­bil­ities With Weak Supervision

leogao16 Dec 2023 5:39 UTC
53 points
5 comments1 min readLW link

Pope Fran­cis shares thoughts on re­spon­si­ble AI development

corruptedCatapillar16 Dec 2023 3:49 UTC
15 points
4 comments1 min readLW link
(www.vatican.va)

Cur­rent AIs Provide Nearly No Data Rele­vant to AGI Alignment

Thane Ruthenis15 Dec 2023 20:16 UTC
110 points
152 comments8 min readLW link