METR is hiring!

Beth BarnesDec 26, 2023, 9:00 PM
65 points
1 comment1 min readLW link

En­vi­ron­men­tal aller­gies are cur­able? (Sublin­gual im­munother­apy)

ChipmonkDec 26, 2023, 7:05 PM
47 points
10 comments1 min readLW link

Pi­casso in the Gallery of Babel

samhealyDec 26, 2023, 4:25 PM
12 points
12 comments4 min readLW link

Flag­ging Po­ten­tially Un­fair Parenting

jefftkDec 26, 2023, 12:40 PM
69 points
1 comment1 min readLW link
(www.jefftk.com)

Link Col­lec­tion: Im­pact Markets

Saul MunnDec 26, 2023, 9:01 AM
27 points
0 comments2 min readLW link
(www.brasstacks.blog)

How Emer­gency Medicine Solves the Align­ment Problem

StrivingForLegibilityDec 26, 2023, 5:24 AM
41 points
4 comments6 min readLW link

Ra­tion­al­ity out­reach vs. ra­tio­nal­ity teaching

LenmarDec 26, 2023, 12:37 AM
7 points
2 comments1 min readLW link

Ex­plor­ing the Resi­d­ual Stream of Trans­form­ers for Mechanis­tic In­ter­pretabil­ity — Explained

Zeping YuDec 26, 2023, 12:36 AM
7 points
1 comment11 min readLW link

[Question] Anki setup best prac­tices?

Sinclair ChenDec 25, 2023, 10:34 PM
11 points
4 comments1 min readLW link

[Question] Why does ex­pected util­ity mat­ter?

Marco DiscendentiDec 25, 2023, 2:47 PM
18 points
21 comments4 min readLW link

Freeze Dried Rasp­berry Truffles

jefftkDec 25, 2023, 2:10 PM
14 points
0 comments1 min readLW link
(www.jefftk.com)

Porno­graphic and semi-porno­graphic ads on main­stream web­sites as an in­stance of the AI al­ign­ment prob­lem?

greenrdDec 25, 2023, 1:19 PM
−1 points
5 comments12 min readLW link

Defense Against The Dark Arts: An Introduction

LyrongolemDec 25, 2023, 6:36 AM
24 points
36 comments20 min readLW link

Oc­clu­sions of Mo­ral Knowledge

herschelDec 25, 2023, 5:55 AM
−1 points
0 comments2 min readLW link
(brothernin.substack.com)

[Question] Would you have a baby in 2024?

martinkunevDec 25, 2023, 1:52 AM
24 points
76 comments1 min readLW link

al­ign your la­tent spaces

bhauthDec 24, 2023, 4:30 PM
27 points
8 comments2 min readLW link
(www.bhauth.com)

Viral Guess­ing Game

jefftkDec 24, 2023, 1:10 PM
19 points
0 comments1 min readLW link
(www.jefftk.com)

The Su­gar Align­ment Problem

Adam ZernerDec 24, 2023, 1:35 AM
5 points
3 comments7 min readLW link

A Crisper Ex­pla­na­tion of Si­mu­lacrum Levels

Thane RuthenisDec 23, 2023, 10:13 PM
92 points
13 comments13 min readLW link

Hyper­bolic Dis­count­ing and Pas­cal’s Mugging

Andrew Keenan RichardsonDec 23, 2023, 9:55 PM
9 points
0 comments7 min readLW link

AISN #28: Cen­ter for AI Safety 2023 Year in Review

Dan HDec 23, 2023, 9:31 PM
30 points
1 comment5 min readLW link
(newsletter.safe.ai)

“In­f­tox­i­c­ity” and other new words to de­scribe mal­i­cious in­for­ma­tion and com­mu­ni­ca­tion thereof

Jáchym FibírDec 23, 2023, 6:15 PM
−1 points
6 comments3 min readLW link

AI’s im­pact on biol­ogy re­search: Part I, today

octopoctaDec 23, 2023, 4:29 PM
31 points
6 comments2 min readLW link

AI Gir­lfriends Won’t Mat­ter Much

Maxwell TabarrokDec 23, 2023, 3:58 PM
42 points
22 comments2 min readLW link
(maximumprogress.substack.com)

The Next Right Token

jefftkDec 23, 2023, 3:20 AM
14 points
0 comments1 min readLW link
(www.jefftk.com)

Fact Find­ing: Do Early Lay­ers Spe­cial­ise in Lo­cal Pro­cess­ing? (Post 5)

Dec 23, 2023, 2:46 AM
18 points
0 comments4 min readLW link

Fact Find­ing: How to Think About In­ter­pret­ing Me­mori­sa­tion (Post 4)

Dec 23, 2023, 2:46 AM
22 points
0 comments9 min readLW link

Fact Find­ing: Try­ing to Mechanis­ti­cally Un­der­stand­ing Early MLPs (Post 3)

Dec 23, 2023, 2:46 AM
10 points
1 comment16 min readLW link

Fact Find­ing: Sim­plify­ing the Cir­cuit (Post 2)

Dec 23, 2023, 2:45 AM
25 points
3 comments14 min readLW link

Fact Find­ing: At­tempt­ing to Re­v­erse-Eng­ineer Fac­tual Re­call on the Neu­ron Level (Post 1)

Dec 23, 2023, 2:44 AM
106 points
10 comments22 min readLW link2 reviews

Mea­sure­ment tam­per­ing de­tec­tion as a spe­cial case of weak-to-strong generalization

Dec 23, 2023, 12:05 AM
57 points
10 comments4 min readLW link

How does a toy 2 digit sub­trac­tion trans­former pre­dict the differ­ence?

Evan AndersDec 22, 2023, 9:17 PM
12 points
0 comments10 min readLW link
(evanhanders.blog)

Thoughts on Max Teg­mark’s AI verification

Johannes C. MayerDec 22, 2023, 8:38 PM
10 points
0 comments3 min readLW link

Ideal­ized Agents Are Ap­prox­i­mate Causal Mir­rors (+ Rad­i­cal Op­ti­mism on Agent Foun­da­tions)

Thane RuthenisDec 22, 2023, 8:19 PM
75 points
14 comments6 min readLW link

AI safety ad­vo­cates should con­sider pro­vid­ing gen­tle push­back fol­low­ing the events at OpenAI

civilsocietyDec 22, 2023, 6:55 PM
16 points
5 comments3 min readLW link

“De­stroy hu­man­ity” as an im­me­di­ate subgoal

Seth AhrenbachDec 22, 2023, 6:52 PM
3 points
13 comments3 min readLW link

Syn­thetic Restrictions

nano_brascaDec 22, 2023, 6:50 PM
10 points
0 comments4 min readLW link

Re­view Re­port of David­son on Take­off Speeds (2023)

Trent KannegieterDec 22, 2023, 6:48 PM
37 points
11 comments38 min readLW link

The prob­lems with the con­cept of an in­fo­haz­ard as used by the LW com­mu­nity [Linkpost]

Noosphere89Dec 22, 2023, 4:13 PM
75 points
43 comments3 min readLW link
(www.beren.io)

Em­ployee In­cen­tives Make AGI Lab Pauses More Costly

Nikola JurkovicDec 22, 2023, 5:04 AM
28 points
12 comments3 min readLW link

The LessWrong 2022 Re­view: Re­view Phase

RobertMDec 22, 2023, 3:23 AM
58 points
7 comments2 min readLW link

The ab­sence of self-re­jec­tion is self-acceptance

ChipmonkDec 21, 2023, 9:54 PM
24 points
1 comment1 min readLW link
(chipmonk.substack.com)

A De­ci­sion The­ory Can Be Ra­tional or Com­putable, but Not Both

StrivingForLegibilityDec 21, 2023, 9:02 PM
9 points
4 comments1 min readLW link

Most Peo­ple Don’t Real­ize We Have No Idea How Our AIs Work

Thane RuthenisDec 21, 2023, 8:02 PM
159 points
42 comments1 min readLW link

Pseudonymity and Accusations

jefftkDec 21, 2023, 7:20 PM
52 points
20 comments3 min readLW link
(www.jefftk.com)

At­ten­tion on AI X-Risk Likely Hasn’t Dis­tracted from Cur­rent Harms from AI

Erich_GrunewaldDec 21, 2023, 5:24 PM
26 points
2 comments17 min readLW link
(www.erichgrunewald.com)

“Align­ment” is one of six words of the year in the Har­vard Gazette

Nikola JurkovicDec 21, 2023, 3:54 PM
14 points
1 comment1 min readLW link
(news.harvard.edu)

AI #43: Func­tional Discoveries

ZviDec 21, 2023, 3:50 PM
52 points
26 comments49 min readLW link
(thezvi.wordpress.com)

Rat­ing my AI Predictions

Robert_AIZIDec 21, 2023, 2:07 PM
22 points
5 comments2 min readLW link
(aizi.substack.com)

AI Safety Chatbot

Dec 21, 2023, 2:06 PM
61 points
11 comments4 min readLW link