AI Rights for Hu­man Safety

Simon GoldsteinAug 1, 2024, 11:01 PM
53 points
6 comments1 min readLW link
(papers.ssrn.com)

Case Study: In­ter­pret­ing, Ma­nipu­lat­ing, and Con­trol­ling CLIP With Sparse Autoencoders

Gytis DaujotasAug 1, 2024, 9:08 PM
45 points
7 comments7 min readLW link

Op­ti­miz­ing Re­peated Correlations

SatvikBeriAug 1, 2024, 5:33 PM
26 points
1 comment1 min readLW link

The need for multi-agent experiments

Martín SotoAug 1, 2024, 5:14 PM
43 points
3 comments9 min readLW link

Dragon Agnosticism

jefftkAug 1, 2024, 5:00 PM
95 points
75 comments2 min readLW link
(www.jefftk.com)

Mor­ris­town ACX Meetup

mbrooksAug 1, 2024, 4:29 PM
2 points
1 comment1 min readLW link

Some com­ments on intelligence

ViliamAug 1, 2024, 3:17 PM
30 points
5 comments3 min readLW link

[Question] [Thought Ex­per­i­ment] Given a but­ton to ter­mi­nate all hu­man­ity, would you press it?

lorepieriAug 1, 2024, 3:10 PM
−2 points
9 comments1 min readLW link

Are un­paid UN in­tern­ships a good idea?

CipollaAug 1, 2024, 3:06 PM
1 point
7 comments4 min readLW link

AI #75: Math is Easier

ZviAug 1, 2024, 1:40 PM
46 points
25 comments72 min readLW link
(thezvi.wordpress.com)

Tem­po­rary Cog­ni­tive Hyper­pa­ram­e­ter Alteration

Jonathan MoregårdAug 1, 2024, 10:27 AM
9 points
0 comments3 min readLW link
(honestliving.substack.com)

Tech­nol­ogy and Progress

Zero ContradictionsAug 1, 2024, 4:49 AM
1 point
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Do Pre­dic­tion Mar­kets Work?

Benjamin_SturiskyAug 1, 2024, 2:31 AM
7 points
0 comments4 min readLW link

2/​3 Aussie & NZ AI Safety folk of­ten or some­times feel lonely or dis­con­nected (and 16 other bar­ri­ers to im­pact)

yanni kyriacosAug 1, 2024, 1:15 AM
12 points
0 comments8 min readLW link

[Question] Can UBI over­come in­fla­tion and rent seek­ing?

Gordon Seidoh WorleyAug 1, 2024, 12:13 AM
5 points
34 comments1 min readLW link

Recom­men­da­tion: re­ports on the search for miss­ing hiker Bill Ewasko

eukaryoteJul 31, 2024, 10:15 PM
169 points
28 comments14 min readLW link
(eukaryotewritesblog.com)

Eco­nomics101 pre­dicted the failure of spe­cial card pay­ments for re­fugees, 3 months later whole of Ger­many wants to adopt it

Yanling GuoJul 31, 2024, 9:09 PM
3 points
3 comments2 min readLW link

Am­bi­guity in Pre­dic­tion Mar­ket Re­s­olu­tion is Still Harmful

aphyerJul 31, 2024, 8:32 PM
43 points
17 comments3 min readLW link

AI labs can boost ex­ter­nal safety research

Zach Stein-PerlmanJul 31, 2024, 7:30 PM
31 points
1 comment1 min readLW link

Women in AI Safety Lon­don Meetup

njgJul 31, 2024, 6:13 PM
1 point
0 comments1 min readLW link

Con­struct­ing Neu­ral Net­work Pa­ram­e­ters with Down­stream Trainability

ch271828nJul 31, 2024, 6:13 PM
1 point
0 comments1 min readLW link
(github.com)

Want to work on US emerg­ing tech policy? Con­sider the Hori­zon Fel­low­ship.

ElikaJul 31, 2024, 6:12 PM
4 points
0 comments1 min readLW link

[Question] What are your cruxes for im­pre­cise prob­a­bil­ities /​ de­ci­sion rules?

Anthony DiGiovanniJul 31, 2024, 3:42 PM
36 points
33 comments1 min readLW link

The new UK gov­ern­ment’s stance on AI safety

Elliot MckernonJul 31, 2024, 3:23 PM
17 points
0 comments4 min readLW link

Cat Sus­te­nance Fortification

jefftkJul 31, 2024, 2:30 AM
14 points
7 comments1 min readLW link
(www.jefftk.com)

Twit­ter thread on open-source AI

Richard_NgoJul 31, 2024, 12:26 AM
33 points
6 comments2 min readLW link
(x.com)

Twit­ter thread on AI takeover scenarios

Richard_NgoJul 31, 2024, 12:24 AM
37 points
0 comments2 min readLW link
(x.com)

Twit­ter thread on AI safety evals

Richard_NgoJul 31, 2024, 12:18 AM
63 points
3 comments2 min readLW link
(x.com)

Twit­ter thread on poli­tics of AI safety

Richard_NgoJul 31, 2024, 12:00 AM
35 points
2 comments1 min readLW link
(x.com)

An ML pa­per on data steal­ing pro­vides a con­struc­tion for “gra­di­ent hack­ing”

David Scott Krueger (formerly: capybaralet)Jul 30, 2024, 9:44 PM
21 points
1 comment1 min readLW link
(arxiv.org)

Open Source Au­to­mated In­ter­pretabil­ity for Sparse Au­toen­coder Features

Jul 30, 2024, 9:11 PM
67 points
1 comment13 min readLW link
(blog.eleuther.ai)

Cater­pillars and Philosophy

Zero ContradictionsJul 30, 2024, 8:54 PM
2 points
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

François Chol­let on the limi­ta­tions of LLMs in reasoning

2PuNCheeZJul 30, 2024, 8:04 PM
1 point
1 comment2 min readLW link
(x.com)

Against AI As An Ex­is­ten­tial Risk

Noah BirnbaumJul 30, 2024, 7:10 PM
6 points
13 comments1 min readLW link
(irrationalitycommunity.substack.com)

[Question] Is ob­jec­tive moral­ity self-defeat­ing?

dialecticaJul 30, 2024, 6:23 PM
−4 points
3 comments2 min readLW link

Limi­ta­tions on the In­ter­pretabil­ity of Learned Fea­tures from Sparse Dic­tionary Learning

Tom AngstenJul 30, 2024, 4:36 PM
6 points
0 comments9 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

Jul 30, 2024, 4:22 PM
222 points
51 comments12 min readLW link

In­ves­ti­gat­ing the Abil­ity of LLMs to Rec­og­nize Their Own Writing

Jul 30, 2024, 3:41 PM
32 points
0 comments15 min readLW link

Can Gen­er­al­ized Ad­ver­sar­ial Test­ing En­able More Ri­gor­ous LLM Safety Evals?

scasperJul 30, 2024, 2:57 PM
25 points
0 comments4 min readLW link

RTFB: Cal­ifor­nia’s AB 3211

ZviJul 30, 2024, 1:10 PM
62 points
2 comments11 min readLW link
(thezvi.wordpress.com)

If You Can Climb Up, You Can Climb Down

jefftkJul 30, 2024, 12:00 AM
34 points
9 comments1 min readLW link
(www.jefftk.com)

What is Mo­ral­ity?

Zero ContradictionsJul 29, 2024, 7:19 PM
−1 points
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Arch-an­ar­chism and im­mor­tal­ity

Peter lawless Jul 29, 2024, 6:10 PM
−5 points
1 comment2 min readLW link

AI Safety Newslet­ter #39: Im­pli­ca­tions of a Trump Ad­minis­tra­tion for AI Policy Plus, Safety Engineering

Jul 29, 2024, 5:50 PM
17 points
1 comment6 min readLW link
(newsletter.safe.ai)

New Blog Post Against AI Doom

Noah BirnbaumJul 29, 2024, 5:21 PM
1 point
5 comments1 min readLW link
(substack.com)

An In­ter­pretabil­ity Illu­sion from Pop­u­la­tion Statis­tics in Causal Analysis

Daniel TanJul 29, 2024, 2:50 PM
9 points
3 comments1 min readLW link

[Question] How to­k­eniza­tion in­fluences prompt­ing?

Boris KashirinJul 29, 2024, 10:28 AM
9 points
4 comments1 min readLW link

Un­der­stand­ing Po­si­tional Fea­tures in Layer 0 SAEs

Jul 29, 2024, 9:36 AM
43 points
0 comments5 min readLW link

Pre­dic­tion Mar­kets Explained

Benjamin_SturiskyJul 29, 2024, 8:02 AM
8 points
0 comments9 min readLW link

Rel­a­tivity The­ory for What the Fu­ture ‘You’ Is and Isn’t

FlorianHJul 29, 2024, 2:01 AM
7 points
49 comments4 min readLW link