AI Safety at the Fron­tier: Paper High­lights, De­cem­ber ’24

gasteigerjoJan 11, 2025, 10:54 PM
7 points
2 comments7 min readLW link
(aisafetyfrontier.substack.com)

Fluori­da­tion: The RCT We Still Haven’t Run (But Should)

ChristianKlJan 11, 2025, 9:02 PM
22 points
5 comments2 min readLW link

In Defense of a But­le­rian Jihad

sloonzJan 11, 2025, 7:30 PM
10 points
25 comments9 min readLW link

Near term dis­cus­sions need some­thing smaller and more con­crete than AGI

ryan_bJan 11, 2025, 6:24 PM
13 points
0 comments6 min readLW link

A pro­posal for iter­ated in­ter­pretabil­ity with known-in­ter­pretable nar­row AIs

Peter BerggrenJan 11, 2025, 2:43 PM
6 points
0 comments2 min readLW link

Have fron­tier AI sys­tems sur­passed the self-repli­cat­ing red line?

nsageJan 11, 2025, 5:31 AM
4 points
0 comments4 min readLW link

We need a uni­ver­sal defi­ni­tion of ‘agency’ and re­lated words

CstineSublimeJan 11, 2025, 3:22 AM
18 points
1 comment5 min readLW link

[Question] AI for med­i­cal care for hard-to-treat dis­eases?

CronoDASJan 10, 2025, 11:55 PM
12 points
1 comment1 min readLW link

Beliefs and state of mind into 2025

RussellThorJan 10, 2025, 10:07 PM
18 points
9 comments7 min readLW link

Recom­men­da­tions for Tech­ni­cal AI Safety Re­search Directions

Sam MarksJan 10, 2025, 7:34 PM
64 points
1 comment17 min readLW link
(alignment.anthropic.com)

Is AI Align­ment Enough?

Aram PanasencoJan 10, 2025, 6:57 PM
28 points
6 comments6 min readLW link

[Question] What are some sce­nar­ios where an al­igned AGI ac­tu­ally helps hu­man­ity, but many/​most peo­ple don’t like it?

RomanSJan 10, 2025, 6:13 PM
13 points
6 comments3 min readLW link

Hu­man takeover might be worse than AI takeover

Tom DavidsonJan 10, 2025, 4:53 PM
143 points
56 comments8 min readLW link
(forethoughtnewsletter.substack.com)

The Align­ment Map­ping Pro­gram: Forg­ing In­de­pen­dent Thinkers in AI Safety—A Pilot Retrospective

Jan 10, 2025, 4:22 PM
28 points
0 comments4 min readLW link

On Dwarkesh Pa­tel’s 4th Pod­cast With Tyler Cowen

ZviJan 10, 2025, 1:50 PM
44 points
7 comments27 min readLW link
(thezvi.wordpress.com)

Scal­ing Sparse Fea­ture Cir­cuit Find­ing to Gemma 9B

Jan 10, 2025, 11:08 AM
86 points
11 comments17 min readLW link

[Question] Is Musk still net-pos­i­tive for hu­man­ity?

mikbpJan 10, 2025, 9:34 AM
−5 points
18 comments1 min readLW link

Ac­ti­va­tion Mag­ni­tudes Mat­ter On Their Own: In­sights from Lan­guage Model Distri­bu­tional Analysis

Matt LevinsonJan 10, 2025, 6:53 AM
4 points
0 comments4 min readLW link

Dmitry’s Koan

Dmitry VaintrobJan 10, 2025, 4:27 AM
44 points
8 comments22 min readLW link

NAO Up­dates, Jan­uary 2025

jefftkJan 10, 2025, 3:37 AM
23 points
0 commentsLW link
(naobservatory.org)

MATS men­tor selection

Jan 10, 2025, 3:12 AM
44 points
12 comments6 min readLW link

AI Fore­cast­ing Bench­mark: Con­grat­u­la­tions to Q4 Win­ners + Q1 Prac­tice Ques­tions Open

ChristianWilliamsJan 10, 2025, 3:02 AM
7 points
0 commentsLW link
(www.metaculus.com)

[Question] How do you de­cide to phrase pre­dic­tions you ask of oth­ers? (and how do you make your own?)

CstineSublimeJan 10, 2025, 2:44 AM
7 points
1 comment2 min readLW link

Deleted

Yanling GuoJan 10, 2025, 1:36 AM
−10 points
0 comments1 min readLW link

You are too dumb to un­der­stand insurance

LorecJan 9, 2025, 11:33 PM
1 point
12 comments7 min readLW link

Is AI Hit­ting a Wall or Mov­ing Faster Than Ever?

garrisonJan 9, 2025, 10:18 PM
12 points
5 commentsLW link
(garrisonlovely.substack.com)

Ex­pevolu, Part II: Buy­ing land to cre­ate countries

FernandoJan 9, 2025, 9:11 PM
4 points
0 comments20 min readLW link
(expevolu.substack.com)

Last week of the Dis­cus­sion Phase

RaemonJan 9, 2025, 7:26 PM
35 points
0 comments3 min readLW link

Dis­cur­sive War­fare and Fac­tion Formation

BenquoJan 9, 2025, 4:47 PM
52 points
3 comments3 min readLW link
(benjaminrosshoffman.com)

Can we res­cue Effec­tive Altru­ism?

ElizabethJan 9, 2025, 4:40 PM
21 points
0 comments1 min readLW link
(acesounderglass.com)

AI #98: World Ends With Six Word Story

ZviJan 9, 2025, 4:30 PM
36 points
2 comments38 min readLW link
(thezvi.wordpress.com)

Many Wor­lds and the Prob­lems of Evil

Jonah WilbergJan 9, 2025, 4:10 PM
−3 points
2 comments9 min readLW link

PIBBSS Fel­low­ship 2025: Boun­ties and Co­op­er­a­tive AI Track Announcement

Jan 9, 2025, 2:23 PM
20 points
0 comments1 min readLW link

The “Every­one Can’t Be Wrong” Prior causes AI risk de­nial but helped pre­his­toric people

Knight LeeJan 9, 2025, 5:54 AM
1 point
0 comments2 min readLW link

Gover­nance Course—Week 1 Reflections

Alice BlairJan 9, 2025, 4:48 AM
4 points
1 comment5 min readLW link

Thoughts on the In-Con­text Schem­ing AI Experiment

ExCephJan 9, 2025, 2:19 AM
2 points
0 comments4 min readLW link

A Sys­tem­atic Ap­proach to AI Risk Anal­y­sis Through Cog­ni­tive Capabilities

Tom DAVIDJan 9, 2025, 12:18 AM
2 points
0 comments3 min readLW link

Gothen­burg LW /​ ACX meetup

StefanJan 8, 2025, 9:39 PM
2 points
0 comments1 min readLW link

Aris­toc­racy and Hostage Capital

Arjun PanicksseryJan 8, 2025, 7:38 PM
108 points
7 comments3 min readLW link
(arjunpanickssery.substack.com)

[Question] What is the most im­pres­sive game LLMs can play well?

Cole WyethJan 8, 2025, 7:38 PM
19 points
20 comments1 min readLW link

The Type of Writ­ing that Pushes Women Away

DahliaJan 8, 2025, 6:54 PM
22 points
4 comments2 min readLW link

Ann Alt­man has filed a law­suit in US fed­eral court alleg­ing that she was sex­u­ally abused by Sam Altman

quanticleJan 8, 2025, 2:59 PM
7 points
3 comments1 min readLW link

AI Safety Outreach Sem­i­nar & So­cial (on­line)

Linda Linsefors8 Jan 2025 13:25 UTC
9 points
0 comments1 min readLW link

XX by Rian Hughes: Pre­ten­tious Bullshit

Yair Halberstadt8 Jan 2025 13:02 UTC
33 points
5 comments5 min readLW link

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

8 Jan 2025 12:49 UTC
148 points
34 comments8 min readLW link

AI Safety as a YC Startup

Lukas Petersson8 Jan 2025 10:46 UTC
56 points
9 comments5 min readLW link

The ab­solute ba­sics of rep­re­sen­ta­tion the­ory of finite groups

Dmitry Vaintrob8 Jan 2025 9:47 UTC
21 points
1 comment10 min readLW link

Im­pli­ca­tions of the AI Se­cu­rity Gap

Dan Braun8 Jan 2025 8:31 UTC
45 points
0 comments9 min readLW link

What are poly­se­man­tic neu­rons?

8 Jan 2025 7:35 UTC
8 points
0 comments4 min readLW link
(aisafety.info)

Tips On Em­piri­cal Re­search Slides

8 Jan 2025 5:06 UTC
90 points
4 comments6 min readLW link