The sling­shot helps with learning

Wilson WuOct 31, 2024, 11:18 PM
33 points
0 comments8 min readLW link

Toward Safety Case In­spired Ba­sic Research

Oct 31, 2024, 11:06 PM
55 points
3 comments13 min readLW link

Spooky Recom­men­da­tion Sys­tem Scaling

phdeadOct 31, 2024, 10:00 PM
11 points
0 comments4 min readLW link

‘Meta’, ‘mesa’, and mountains

LorecOct 31, 2024, 5:25 PM
1 point
0 comments3 min readLW link

Toward Safety Cases For AI Scheming

Oct 31, 2024, 5:20 PM
60 points
1 comment2 min readLW link

AI #88: Thanks for the Memos

ZviOct 31, 2024, 3:00 PM
46 points
5 comments77 min readLW link
(thezvi.wordpress.com)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

Oct 31, 2024, 12:01 PM
195 points
52 comments2 min readLW link
(www.thecompendium.ai)

Some Pre­limi­nary Notes on the Promise of a Wis­dom Explosion

Chris_LeongOct 31, 2024, 9:21 AM
2 points
0 comments1 min readLW link
(aiimpacts.org)

What TMS is like

SableOct 31, 2024, 12:44 AM
208 points
23 comments6 min readLW link
(affablyevil.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Oc­to­ber ’24

gasteigerjoOct 31, 2024, 12:09 AM
3 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

Stan­dard SAEs Might Be In­co­her­ent: A Choos­ing Prob­lem & A “Con­cise” Solution

Kola AyonrindeOct 30, 2024, 10:50 PM
27 points
0 comments12 min readLW link

Generic ad­vice caveats

Saul MunnOct 30, 2024, 9:03 PM
27 points
1 comment3 min readLW link
(www.brasstacks.blog)

I turned de­ci­sion the­ory prob­lems into memes about trolleys

TapataktOct 30, 2024, 8:13 PM
104 points
23 comments1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoinerOct 30, 2024, 6:31 PM
46 points
8 comments4 min readLW link

[Question] How might lan­guage in­fluence how an AI “thinks”?

bodryOct 30, 2024, 5:41 PM
3 points
0 comments1 min readLW link

Mo­ti­va­tion control

Joe CarlsmithOct 30, 2024, 5:15 PM
45 points
7 comments52 min readLW link

Up­dat­ing the NAO Simulator

jefftkOct 30, 2024, 1:50 PM
11 points
0 comments2 min readLW link
(www.jefftk.com)

Oc­cu­pa­tional Li­cens­ing Roundup #1

ZviOct 30, 2024, 11:00 AM
65 points
11 comments11 min readLW link
(thezvi.wordpress.com)

Three No­tions of “Power”

johnswentworthOct 30, 2024, 6:10 AM
92 points
44 comments4 min readLW link

In­tro­duc­tion to Choice set Misspeci­fi­ca­tion in Re­ward In­fer­ence

Rahul ChandOct 29, 2024, 10:57 PM
1 point
0 comments8 min readLW link

Gothen­burg LW/​ACX meetup

StefanOct 29, 2024, 8:40 PM
2 points
0 comments1 min readLW link

The Align­ment Trap: AI Safety as Path to Power

crispweedOct 29, 2024, 3:21 PM
57 points
17 comments5 min readLW link
(upcoder.com)

Hous­ing Roundup #10

ZviOct 29, 2024, 1:50 PM
32 points
2 comments32 min readLW link
(thezvi.wordpress.com)

[In­tu­itive self-mod­els] 7. Hear­ing Voices, and Other Hallucinations

Steven ByrnesOct 29, 2024, 1:36 PM
51 points
2 comments16 min readLW link

Re­view: “The Case Against Real­ity”

David GrossOct 29, 2024, 1:13 PM
20 points
9 comments5 min readLW link

A Poem Is All You Need: Jailbreak­ing ChatGPT, Meta & More

Sharat Jacob JacobOct 29, 2024, 12:41 PM
12 points
0 comments9 min readLW link

Search­ing for phe­nom­e­nal con­scious­ness in LLMs: Per­cep­tual re­al­ity mon­i­tor­ing and in­tro­spec­tive confidence

EuanMcLeanOct 29, 2024, 12:16 PM
45 points
9 comments26 min readLW link

AI #87: Stay­ing in Character

ZviOct 29, 2024, 7:10 AM
57 points
3 comments33 min readLW link
(thezvi.wordpress.com)

A path to hu­man autonomy

Nathan Helm-BurgerOct 29, 2024, 3:02 AM
53 points
16 comments20 min readLW link

D&D.Sci Coli­seum: Arena of Data Eval­u­a­tion and Ruleset

aphyerOct 29, 2024, 1:21 AM
47 points
13 comments6 min readLW link

Gw­ern: Why So Few Matt Lev­ines?

kaveOct 29, 2024, 1:07 AM
78 points
10 comments1 min readLW link
(gwern.net)

Oc­to­ber 2024 Progress in Guaran­teed Safe AI

QuinnOct 28, 2024, 11:34 PM
7 points
0 comments1 min readLW link
(gsai.substack.com)

5 home­grown EA pro­jects, seek­ing small donors

Austin ChenOct 28, 2024, 11:24 PM
85 points
4 commentsLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe CarlsmithOct 28, 2024, 9:57 PM
54 points
5 comments32 min readLW link

En­hanc­ing Math­e­mat­i­cal Model­ing with LLMs: Goals, Challenges, and Evaluations

ozziegooenOct 28, 2024, 9:44 PM
7 points
0 commentsLW link

AI & wis­dom 3: AI effects on amor­tised optimisation

L Rudolf LOct 28, 2024, 9:08 PM
18 points
0 comments14 min readLW link
(rudolf.website)

AI & wis­dom 2: growth and amor­tised optimisation

L Rudolf LOct 28, 2024, 9:07 PM
18 points
0 comments8 min readLW link
(rudolf.website)

AI & wis­dom 1: wis­dom, amor­tised op­ti­mi­sa­tion, and AI

L Rudolf LOct 28, 2024, 9:02 PM
29 points
0 comments15 min readLW link
(rudolf.website)

Finish­ing The SB-1047 Doc­u­men­tary In 6 Weeks

Michaël TrazziOct 28, 2024, 8:17 PM
94 points
7 comments4 min readLW link
(manifund.org)

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane RuthenisOct 28, 2024, 7:45 PM
20 points
2 comments33 min readLW link
(aiimpacts.org)

Quan­ti­ta­tive Trad­ing Boot­camp [Nov 6-10]

Ricki HeicklenOct 28, 2024, 6:39 PM
7 points
0 comments1 min readLW link

Win­ners of the Es­say com­pe­ti­tion on the Au­toma­tion of Wis­dom and Philosophy

Oct 28, 2024, 5:10 PM
40 points
3 comments30 min readLW link
(blog.aiimpacts.org)

Miles Brundage: Find­ing Ways to Cred­ibly Sig­nal the Benign­ness of AI Devel­op­ment and De­ploy­ment is an Ur­gent Priority

Zach Stein-PerlmanOct 28, 2024, 5:00 PM
22 points
4 comments3 min readLW link
(milesbrundage.substack.com)

[Question] some­body ex­plain the word “epistemic” to me

KvmanThinkingOct 28, 2024, 4:40 PM
7 points
8 comments1 min readLW link

~80 In­ter­est­ing Ques­tions about Foun­da­tion Model Agent Safety

Oct 28, 2024, 4:37 PM
46 points
4 comments15 min readLW link

AI Safety Newslet­ter #43: White House Is­sues First Na­tional Se­cu­rity Memo on AI Plus, AI and Job Dis­place­ment, and AI Takes Over the Nobels

Oct 28, 2024, 4:03 PM
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

Death notes − 7 thoughts on death

Nathan YoungOct 28, 2024, 3:01 PM
26 points
1 comment5 min readLW link
(nathanpmyoung.substack.com)

SAEs you can See: Ap­ply­ing Sparse Au­toen­coders to Clustering

Robert_AIZIOct 28, 2024, 2:48 PM
27 points
0 comments10 min readLW link

Bridg­ing the VLM and mech in­terp com­mu­ni­ties for mul­ti­modal in­ter­pretabil­ity

Sonia Joseph28 Oct 2024 14:41 UTC
19 points
5 comments15 min readLW link

How Likely Are Var­i­ous Pre­cur­sors of Ex­is­ten­tial Risk?

NunoSempere28 Oct 2024 13:27 UTC
55 points
4 comments15 min readLW link
(blog.sentinel-team.org)