In­tro­duc­tion to Choice set Misspeci­fi­ca­tion in Re­ward In­fer­ence

Rahul ChandOct 29, 2024, 10:57 PM
1 point
0 comments8 min readLW link

Gothen­burg LW/​ACX meetup

StefanOct 29, 2024, 8:40 PM
2 points
0 comments1 min readLW link

The Align­ment Trap: AI Safety as Path to Power

crispweedOct 29, 2024, 3:21 PM
57 points
17 comments5 min readLW link
(upcoder.com)

Hous­ing Roundup #10

ZviOct 29, 2024, 1:50 PM
32 points
2 comments32 min readLW link
(thezvi.wordpress.com)

[In­tu­itive self-mod­els] 7. Hear­ing Voices, and Other Hallucinations

Steven ByrnesOct 29, 2024, 1:36 PM
51 points
2 comments16 min readLW link

Re­view: “The Case Against Real­ity”

David GrossOct 29, 2024, 1:13 PM
20 points
9 comments5 min readLW link

A Poem Is All You Need: Jailbreak­ing ChatGPT, Meta & More

Sharat Jacob JacobOct 29, 2024, 12:41 PM
12 points
0 comments9 min readLW link

Search­ing for phe­nom­e­nal con­scious­ness in LLMs: Per­cep­tual re­al­ity mon­i­tor­ing and in­tro­spec­tive confidence

EuanMcLeanOct 29, 2024, 12:16 PM
45 points
9 comments26 min readLW link

AI #87: Stay­ing in Character

ZviOct 29, 2024, 7:10 AM
57 points
3 comments33 min readLW link
(thezvi.wordpress.com)

A path to hu­man autonomy

Nathan Helm-BurgerOct 29, 2024, 3:02 AM
53 points
16 comments20 min readLW link

D&D.Sci Coli­seum: Arena of Data Eval­u­a­tion and Ruleset

aphyerOct 29, 2024, 1:21 AM
47 points
13 comments6 min readLW link

Gw­ern: Why So Few Matt Lev­ines?

kaveOct 29, 2024, 1:07 AM
78 points
10 comments1 min readLW link
(gwern.net)

Oc­to­ber 2024 Progress in Guaran­teed Safe AI

QuinnOct 28, 2024, 11:34 PM
7 points
0 comments1 min readLW link
(gsai.substack.com)

5 home­grown EA pro­jects, seek­ing small donors

Austin ChenOct 28, 2024, 11:24 PM
85 points
4 commentsLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe CarlsmithOct 28, 2024, 9:57 PM
54 points
5 comments32 min readLW link

En­hanc­ing Math­e­mat­i­cal Model­ing with LLMs: Goals, Challenges, and Evaluations

ozziegooenOct 28, 2024, 9:44 PM
7 points
0 commentsLW link

AI & wis­dom 3: AI effects on amor­tised optimisation

L Rudolf LOct 28, 2024, 9:08 PM
18 points
0 comments14 min readLW link
(rudolf.website)

AI & wis­dom 2: growth and amor­tised optimisation

L Rudolf LOct 28, 2024, 9:07 PM
18 points
0 comments8 min readLW link
(rudolf.website)

AI & wis­dom 1: wis­dom, amor­tised op­ti­mi­sa­tion, and AI

L Rudolf LOct 28, 2024, 9:02 PM
29 points
0 comments15 min readLW link
(rudolf.website)

Finish­ing The SB-1047 Doc­u­men­tary In 6 Weeks

Michaël TrazziOct 28, 2024, 8:17 PM
94 points
7 comments4 min readLW link
(manifund.org)

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane RuthenisOct 28, 2024, 7:45 PM
20 points
2 comments33 min readLW link
(aiimpacts.org)

Quan­ti­ta­tive Trad­ing Boot­camp [Nov 6-10]

Ricki HeicklenOct 28, 2024, 6:39 PM
7 points
0 comments1 min readLW link

Win­ners of the Es­say com­pe­ti­tion on the Au­toma­tion of Wis­dom and Philosophy

Oct 28, 2024, 5:10 PM
40 points
3 comments30 min readLW link
(blog.aiimpacts.org)

Miles Brundage: Find­ing Ways to Cred­ibly Sig­nal the Benign­ness of AI Devel­op­ment and De­ploy­ment is an Ur­gent Priority

Zach Stein-PerlmanOct 28, 2024, 5:00 PM
22 points
4 comments3 min readLW link
(milesbrundage.substack.com)

[Question] some­body ex­plain the word “epistemic” to me

KvmanThinkingOct 28, 2024, 4:40 PM
7 points
8 comments1 min readLW link

~80 In­ter­est­ing Ques­tions about Foun­da­tion Model Agent Safety

Oct 28, 2024, 4:37 PM
46 points
4 comments15 min readLW link

AI Safety Newslet­ter #43: White House Is­sues First Na­tional Se­cu­rity Memo on AI Plus, AI and Job Dis­place­ment, and AI Takes Over the Nobels

Oct 28, 2024, 4:03 PM
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

Death notes − 7 thoughts on death

Nathan YoungOct 28, 2024, 3:01 PM
26 points
1 comment5 min readLW link
(nathanpmyoung.substack.com)

SAEs you can See: Ap­ply­ing Sparse Au­toen­coders to Clustering

Robert_AIZIOct 28, 2024, 2:48 PM
27 points
0 comments10 min readLW link

Bridg­ing the VLM and mech in­terp com­mu­ni­ties for mul­ti­modal in­ter­pretabil­ity

Sonia JosephOct 28, 2024, 2:41 PM
19 points
5 comments15 min readLW link

How Likely Are Var­i­ous Pre­cur­sors of Ex­is­ten­tial Risk?

NunoSempereOct 28, 2024, 1:27 PM
55 points
4 comments15 min readLW link
(blog.sentinel-team.org)

Care Doesn’t Scale

stavrosOct 28, 2024, 11:57 AM
27 points
1 comment1 min readLW link
(stevenscrawls.com)

Your mem­ory even­tu­ally drives con­fi­dence in each hy­poth­e­sis to 1 or 0

Crazy philosopherOct 28, 2024, 9:00 AM
3 points
6 comments1 min readLW link

Nerdtri­tion: sim­ple diets via spread­sheet abuse

dkl9Oct 27, 2024, 9:45 PM
8 points
0 comments3 min readLW link
(dkl9.net)

AGI Fermi Paradox

jrincaycOct 27, 2024, 8:14 PM
0 points
2 comments2 min readLW link

Sub­sti­tut­ing Talk­box for Breath Controller

jefftkOct 27, 2024, 7:10 PM
11 points
0 comments1 min readLW link
(www.jefftk.com)

Open Source Repli­ca­tion of An­thropic’s Cross­coder pa­per for model-diffing

Oct 27, 2024, 6:46 PM
48 points
4 comments5 min readLW link

Hiring a writer to co-au­thor with me (Spencer Green­berg for Clear­erThink­ing.org)

spencergOct 27, 2024, 5:34 PM
16 points
0 commentsLW link

In­ter­view with Bill O’Rourke—Rus­sian Cor­rup­tion, Putin, Ap­plied Ethics, and More

JohnGreerOct 27, 2024, 5:11 PM
3 points
0 comments6 min readLW link

On Shifgrethor

JustisMillsOct 27, 2024, 3:30 PM
67 points
18 comments2 min readLW link
(justismills.substack.com)

The hos­tile telepaths problem

ValentineOct 27, 2024, 3:26 PM
383 points
89 comments15 min readLW link

[Question] What are some good ways to form opinions on con­tro­ver­sial sub­jects in the cur­rent and up­com­ing era?

Terence CoelhoOct 27, 2024, 2:33 PM
9 points
21 comments1 min readLW link

Video lec­tures on the learn­ing-the­o­retic agenda

Vanessa KosoyOct 27, 2024, 12:01 PM
75 points
0 comments1 min readLW link
(www.youtube.com)

Dario Amodei’s “Machines of Lov­ing Grace” sound in­cred­ibly dan­ger­ous, for Humans

Super AGIOct 27, 2024, 5:05 AM
8 points
1 comment1 min readLW link

Elec­tro­static Air­ships?

DaemonicSigilOct 27, 2024, 4:32 AM
64 points
13 comments3 min readLW link
(pbement.com)

A suite of Vi­sion Sparse Au­toen­coders

Oct 27, 2024, 4:05 AM
25 points
0 comments1 min readLW link

Ways to think about alignment

Abhimanyu Pallavi SudhirOct 27, 2024, 1:40 AM
6 points
0 comments4 min readLW link

[Question] Is there a CFAR hand­book au­dio op­tion?

FinalFormal2Oct 26, 2024, 5:08 PM
16 points
0 comments1 min readLW link

Retrieval Aug­mented Ge­n­e­sis II — Holy Texts Se­man­tics Analysis

João Ribeiro MedeirosOct 26, 2024, 5:00 PM
−1 points
0 comments11 min readLW link

A su­perfi­cially plau­si­ble promis­ing al­ter­nate Earth with­out lockstep

LorecOct 26, 2024, 4:04 PM
−2 points
3 comments4 min readLW link