Search­ing for the Root of the Tree of Evil

Ivan VendrovJun 8, 2024, 5:05 PM
36 points
14 comments5 min readLW link
(nothinghuman.substack.com)

2. Cor­rigi­bil­ity Intuition

Max HarmsJun 8, 2024, 3:52 PM
67 points
10 comments33 min readLW link

Two easy things that maybe Just Work to im­prove AI discourse

Bird ConceptJun 8, 2024, 3:51 PM
191 points
35 comments2 min readLW link

I made an AI safety fel­low­ship. What I wish I knew.

Ruben CastaingJun 8, 2024, 3:23 PM
12 points
0 comments2 min readLW link

Align­ment Gaps

kcyrasJun 8, 2024, 3:23 PM
11 points
4 comments8 min readLW link

The Slack Dou­ble Crux, or how to ne­go­ti­ate with yourself

Thac0Jun 8, 2024, 3:22 PM
6 points
2 comments4 min readLW link

The Per­ils of Pop­u­lar­ity: A Crit­i­cal Ex­am­i­na­tion of LessWrong’s Ra­tional Discourse

BubbaJoeLouisJun 8, 2024, 3:22 PM
−24 points
3 comments2 min readLW link

Sta­tus quo bias is usu­ally justified

Amadeus PagelJun 8, 2024, 2:54 PM
10 points
3 comments1 min readLW link
(amadeuspagel.substack.com)

Closed-Source Evaluations

JonoJun 8, 2024, 2:18 PM
15 points
4 comments1 min readLW link

Ac­cess to pow­er­ful AI might make com­puter se­cu­rity rad­i­cally easier

BuckJun 8, 2024, 6:00 AM
105 points
14 comments6 min readLW link

[Question] Why don’t we just get rid of all the bioethi­cists?

SableJun 8, 2024, 3:48 AM
13 points
0 comments1 min readLW link

Sev, Sev­teen, Sevty, Sevth

jefftkJun 8, 2024, 2:30 AM
17 points
9 comments1 min readLW link
(www.jefftk.com)

1. The CAST Strategy

Max HarmsJun 7, 2024, 10:29 PM
48 points
22 comments38 min readLW link

0. CAST: Cor­rigi­bil­ity as Sin­gu­lar Target

Max HarmsJun 7, 2024, 10:29 PM
147 points
17 comments8 min readLW link

What is space? What is time?

TahpJun 7, 2024, 10:15 PM
8 points
3 comments7 min readLW link

[Question] Ques­tion about Lewis’ coun­ter­fac­tual the­ory of causation

jbkjrJun 7, 2024, 8:15 PM
12 points
7 comments1 min readLW link

Re­la­tion­ships among words, met­al­in­gual defi­ni­tion, and interpretability

Bill BenzonJun 7, 2024, 7:18 PM
2 points
0 comments5 min readLW link

Let’s Talk About Emergence

jacobhaimesJun 7, 2024, 7:18 PM
4 points
0 comments7 min readLW link
(www.odysseaninstitute.org)

D&D.Sci Alchemy: Arch­mage Anachronos and the Sup­ply Chain Issues

aphyerJun 7, 2024, 7:02 PM
42 points
16 comments3 min readLW link

Nat­u­ral La­tents Are Not Ro­bust To Tiny Mixtures

Jun 7, 2024, 6:53 PM
61 points
8 comments5 min readLW link

Si­tu­a­tional Aware­ness Sum­ma­rized—Part 2

Joe RogeroJun 7, 2024, 5:20 PM
12 points
2 comments4 min readLW link

Frida van Lisa, a short story about ad­ver­sar­ial AI at­tacks on humans

arisAlexisJun 7, 2024, 1:22 PM
2 points
0 comments18 min readLW link

Quotes from Leopold Aschen­bren­ner’s Si­tu­a­tional Aware­ness Paper

ZviJun 7, 2024, 11:40 AM
97 points
10 comments37 min readLW link
(thezvi.wordpress.com)

LessWrong/​ACX meetup Tran­sil­vanya tour—Cluj Napoca

Marius Adrian NicoarăJun 7, 2024, 5:45 AM
1 point
1 comment1 min readLW link

Is Claude a mys­tic?

jessicataJun 7, 2024, 4:27 AM
60 points
23 comments13 min readLW link
(unstablerontology.substack.com)

Offer­ing Completion

jefftkJun 7, 2024, 1:40 AM
29 points
6 comments1 min readLW link
(www.jefftk.com)

A Case for Su­per­hu­man Gover­nance, us­ing AI

ozziegooenJun 7, 2024, 12:10 AM
30 points
0 commentsLW link

Me­moriz­ing weak ex­am­ples can elicit strong be­hav­ior out of pass­word-locked models

Jun 6, 2024, 11:54 PM
58 points
5 comments7 min readLW link

Re­sponse to Aschen­bren­ner’s “Si­tu­a­tional Aware­ness”

Rob BensingerJun 6, 2024, 10:57 PM
194 points
27 comments3 min readLW link

Scal­ing and eval­u­at­ing sparse autoencoders

leogaoJun 6, 2024, 10:50 PM
106 points
6 comments1 min readLW link

Hum­ming is not a free $100 bill

ElizabethJun 6, 2024, 8:10 PM
185 points
6 comments3 min readLW link
(acesounderglass.com)

There Are No Pri­mor­dial Defi­ni­tions of Man/​Woman

ymeskhoutJun 6, 2024, 7:30 PM
11 points
0 comments4 min readLW link
(ymeskhout.substack.com)

Si­tu­a­tional Aware­ness Sum­ma­rized—Part 1

Joe RogeroJun 6, 2024, 6:59 PM
21 points
0 comments5 min readLW link

[Link Post] “Foun­da­tional Challenges in As­sur­ing Align­ment and Safety of Large Lan­guage Models”

David Scott Krueger (formerly: capybaralet)Jun 6, 2024, 6:55 PM
70 points
2 comments6 min readLW link
(llm-safety-challenges.github.io)

AI #67: Brief Strange Trip

ZviJun 6, 2024, 6:50 PM
49 points
6 comments40 min readLW link
(thezvi.wordpress.com)

The Hu­man Biolog­i­cal Ad­van­tage Over AI

WstewartJun 6, 2024, 6:18 PM
−13 points
2 comments1 min readLW link

An eval­u­a­tion of He­len Toner’s in­ter­view on the TED AI Show

PeterHJun 6, 2024, 5:39 PM
24 points
2 comments30 min readLW link

The Im­pos­si­bil­ity of a Ra­tional In­tel­li­gence Optimizer

Nicolas VillarrealJun 6, 2024, 4:14 PM
−9 points
5 comments14 min readLW link

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

Jun 6, 2024, 3:17 PM
4 points
0 comments12 min readLW link

SB 1047 Is Weakened

ZviJun 6, 2024, 1:40 PM
67 points
4 comments9 min readLW link
(thezvi.wordpress.com)

Weep­ing Agents

pleiotrothJun 6, 2024, 12:18 PM
24 points
2 comments3 min readLW link

Pod­cast: Cen­ter for AI Policy, on AI risk and listen­ing to AI researchers

KatjaGraceJun 6, 2024, 3:30 AM
9 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

Calcu­lat­ing Nat­u­ral La­tents via Resampling

Jun 6, 2024, 12:37 AM
55 points
4 comments10 min readLW link

SAEs Dis­cover Mean­ingful Fea­tures in the IOI Task

Jun 5, 2024, 11:48 PM
15 points
2 comments10 min readLW link

Let’s De­sign A School, Part 2.4 School as Ed­u­ca­tion—The Cur­ricu­lum (Phase 3, Spe­cific)

SableJun 5, 2024, 9:40 PM
19 points
2 comments12 min readLW link
(affablyevil.substack.com)

METR is hiring ML Re­search Eng­ineers and Scientists

XodarapJun 5, 2024, 9:27 PM
5 points
0 comments1 min readLW link
(metr.org)

Book re­view: The Quincunx

cousin_itJun 5, 2024, 9:13 PM
41 points
12 comments2 min readLW link

[Question] How should I think about my ca­reer?

ChicoJun 5, 2024, 6:11 PM
3 points
2 comments1 min readLW link

AISN #36: Vol­un­tary Com­mit­ments are In­suffi­cient Plus, a Se­nate AI Policy Roadmap, and Chap­ter 1: An Overview of Catas­trophic Risks

Jun 5, 2024, 5:45 PM
9 points
0 comments5 min readLW link
(newsletter.safe.ai)

GPT2, Five Years On

Joel BurgetJun 5, 2024, 5:44 PM
34 points
0 comments3 min readLW link
(importai.substack.com)