The ab­sence of self-re­jec­tion is self-acceptance

Chipmonk21 Dec 2023 21:54 UTC
20 points
1 comment1 min readLW link
(chipmonk.substack.com)

A De­ci­sion The­ory Can Be Ra­tional or Com­putable, but Not Both

StrivingForLegibility21 Dec 2023 21:02 UTC
9 points
4 comments1 min readLW link

Most Peo­ple Don’t Real­ize We Have No Idea How Our AIs Work

Thane Ruthenis21 Dec 2023 20:02 UTC
151 points
42 comments1 min readLW link

Pseudonymity and Accusations

jefftk21 Dec 2023 19:20 UTC
52 points
20 comments3 min readLW link
(www.jefftk.com)

At­ten­tion on AI X-Risk Likely Hasn’t Dis­tracted from Cur­rent Harms from AI

Erich_Grunewald21 Dec 2023 17:24 UTC
26 points
2 comments17 min readLW link
(www.erichgrunewald.com)

“Align­ment” is one of six words of the year in the Har­vard Gazette

nikola21 Dec 2023 15:54 UTC
14 points
1 comment1 min readLW link
(news.harvard.edu)

AI #43: Func­tional Discoveries

Zvi21 Dec 2023 15:50 UTC
52 points
26 comments49 min readLW link
(thezvi.wordpress.com)

Rat­ing my AI Predictions

Robert_AIZI21 Dec 2023 14:07 UTC
22 points
5 comments2 min readLW link
(aizi.substack.com)

AI Safety Chatbot

21 Dec 2023 14:06 UTC
58 points
11 comments4 min readLW link

On OpenAI’s Pre­pared­ness Framework

Zvi21 Dec 2023 14:00 UTC
51 points
4 comments21 min readLW link
(thezvi.wordpress.com)

Pre­dic­tion Mar­kets aren’t Magic

SimonM21 Dec 2023 12:54 UTC
90 points
29 comments3 min readLW link

[Question] Why is cap­nom­e­try biofeed­back not more widely known?

riceissa21 Dec 2023 2:42 UTC
20 points
22 comments4 min readLW link

My best guess at the im­por­tant tricks for train­ing 1L SAEs

Arthur Conmy21 Dec 2023 1:59 UTC
35 points
4 comments3 min readLW link

Seat­tle Win­ter Solstice

a7x20 Dec 2023 20:30 UTC
6 points
1 comment1 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC
31 points
23 comments10 min readLW link

Succession

Richard_Ngo20 Dec 2023 19:25 UTC
157 points
48 comments11 min readLW link
(www.narrativeark.xyz)

Me­tac­u­lus In­tro­duces Mul­ti­ple Choice Questions

ChristianWilliams20 Dec 2023 19:00 UTC
4 points
0 comments1 min readLW link
(www.metaculus.com)

Brighter Than To­day Versions

jefftk20 Dec 2023 18:20 UTC
16 points
2 comments2 min readLW link
(www.jefftk.com)

Gaia Net­work: a prac­ti­cal, in­cre­men­tal path­way to Open Agency Architecture

20 Dec 2023 17:11 UTC
19 points
8 comments16 min readLW link

On the fu­ture of lan­guage models

owencb20 Dec 2023 16:58 UTC
105 points
17 comments1 min readLW link

[Valence se­ries] Ap­pendix A: He­donic tone /​ (dis)plea­sure /​ (dis)liking

Steven Byrnes20 Dec 2023 15:54 UTC
15 points
0 comments12 min readLW link

Ma­trix com­ple­tion prize results

paulfchristiano20 Dec 2023 15:40 UTC
41 points
0 comments2 min readLW link
(www.alignment.org)

[Question] What’s the min­i­mal ad­di­tive con­stant for Kol­mogorov Com­plex­ity that a pro­gram­ming lan­guage can achieve?

Noosphere8920 Dec 2023 15:36 UTC
11 points
15 comments1 min readLW link

Le­gal­ize bu­tanol?

bhauth20 Dec 2023 14:24 UTC
39 points
20 comments5 min readLW link
(www.bhauth.com)

A short di­alogue on com­pa­ra­bil­ity of values

cousin_it20 Dec 2023 14:08 UTC
27 points
7 comments1 min readLW link

In­side View, Out­side View… And Op­pos­ing View

chaosmage20 Dec 2023 12:35 UTC
21 points
1 comment5 min readLW link

Heuris­tics for pre­vent­ing ma­jor life mistakes

SK220 Dec 2023 8:01 UTC
28 points
2 comments3 min readLW link

What should be reified?

herschel20 Dec 2023 4:52 UTC
4 points
2 comments2 min readLW link
(brothernin.substack.com)

(In)ap­pro­pri­ate (De)reification

herschel20 Dec 2023 4:51 UTC
10 points
1 comment4 min readLW link
(brothernin.substack.com)

Es­cap­ing Skeuomorphism

Stuart Johnson20 Dec 2023 3:51 UTC
28 points
0 comments8 min readLW link

Ronny and Nate dis­cuss what sorts of minds hu­man­ity is likely to find by Ma­chine Learning

19 Dec 2023 23:39 UTC
35 points
30 comments25 min readLW link

[Question] What are the best Siderea posts?

mike_hawke19 Dec 2023 23:07 UTC
16 points
2 comments1 min readLW link

Mean­ing & Agency

abramdemski19 Dec 2023 22:27 UTC
90 points
17 comments14 min readLW link

s/​acc: Safe Ac­cel­er­a­tionism Manifesto

lorepieri19 Dec 2023 22:19 UTC
−4 points
5 comments2 min readLW link
(lorenzopieri.com)

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC
67 points
11 comments1 min readLW link

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

19 Dec 2023 19:14 UTC
45 points
4 comments6 min readLW link
(arxiv.org)

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

How does a toy 2 digit sub­trac­tion trans­former pre­dict the sign of the out­put?

Evan Anders19 Dec 2023 18:56 UTC
14 points
0 comments8 min readLW link
(evanhanders.blog)

In­cre­men­tal AI Risks from Proxy-Simulations

kmenou19 Dec 2023 18:56 UTC
2 points
0 comments1 min readLW link
(individual.utoronto.ca)

A propo­si­tion for the mod­ifi­ca­tion of our epistemology

JacobBowden19 Dec 2023 18:55 UTC
−4 points
2 comments4 min readLW link

Goal-Com­plete­ness is like Tur­ing-Com­plete­ness for AGI

Liron19 Dec 2023 18:12 UTC
50 points
26 comments3 min readLW link

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman Leventov19 Dec 2023 16:49 UTC
17 points
5 comments3 min readLW link

Chord­ing “The Next Right Thing”

jefftk19 Dec 2023 15:40 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

Monthly Roundup #13: De­cem­ber 2023

Zvi19 Dec 2023 15:10 UTC
32 points
5 comments26 min readLW link
(thezvi.wordpress.com)

Effec­tive Asper­sions: How the Non­lin­ear In­ves­ti­ga­tion Went Wrong

TracingWoodgrains19 Dec 2023 12:00 UTC
168 points
170 comments1 min readLW link

A Univer­sal Emer­gent De­com­po­si­tion of Retrieval Tasks in Lan­guage Models

19 Dec 2023 11:52 UTC
81 points
3 comments10 min readLW link
(arxiv.org)

Assess­ment of AI safety agen­das: think about the down­side risk

Roman Leventov19 Dec 2023 9:00 UTC
13 points
1 comment1 min readLW link

A So­cratic Dialogue about So­cratic Dialogues

19 Dec 2023 7:50 UTC
30 points
0 comments5 min readLW link

Con­stel­la­tions are Younger than Continents

Jeffrey Heninger19 Dec 2023 6:12 UTC
259 points
22 comments2 min readLW link

The Dark Arts

19 Dec 2023 4:41 UTC
128 points
49 comments9 min readLW link