An­thropic an­nounces in­ter­pretabil­ity ad­vances. How much does this ad­vance al­ign­ment?

Seth HerdMay 21, 2024, 10:30 PM
49 points
4 comments3 min readLW link
(www.anthropic.com)

[Question] What would stop you from pay­ing for an LLM?

yanni kyriacosMay 21, 2024, 10:25 PM
17 points
15 comments1 min readLW link

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasperMay 21, 2024, 8:15 PM
157 points
16 comments3 min readLW link

Miti­gat­ing ex­treme AI risks amid rapid progress [Linkpost]

Orpheus16May 21, 2024, 7:59 PM
21 points
7 comments4 min readLW link

The prob­lem with rationality

David LoomisMay 21, 2024, 6:49 PM
−17 points
1 comment6 min readLW link

rough draft on what hap­pens in the brain when you have an insight

EmrikMay 21, 2024, 6:02 PM
11 points
2 comments1 min readLW link

On Dwarkesh’s Pod­cast with OpenAI’s John Schulman

ZviMay 21, 2024, 5:30 PM
73 points
4 comments20 min readLW link
(thezvi.wordpress.com)

[Question] Is delet­ing ca­pa­bil­ities still a rele­vant re­search ques­tion?

tailcalledMay 21, 2024, 1:24 PM
15 points
1 comment1 min readLW link

New vol­un­tary com­mit­ments (AI Seoul Sum­mit)

Zach Stein-PerlmanMay 21, 2024, 11:00 AM
81 points
17 comments7 min readLW link
(www.gov.uk)

ACX/​LW/​EA/​* Meetup Bremen

RasmusHBMay 21, 2024, 5:42 AM
2 points
0 comments1 min readLW link

My Dat­ing Heuristic

Declan MolonyMay 21, 2024, 5:28 AM
26 points
4 comments2 min readLW link

Scorable Func­tions: A For­mat for Al­gorith­mic Forecasting

ozziegooenMay 21, 2024, 4:14 AM
29 points
0 commentsLW link

The Prob­lem With the Word ‘Align­ment’

May 21, 2024, 3:48 AM
63 points
8 comments6 min readLW link

What’s Go­ing on With OpenAI’s Mes­sag­ing?

ozziegooenMay 21, 2024, 2:22 AM
191 points
13 commentsLW link

Har­mony In­tel­li­gence is Hiring!

May 21, 2024, 2:11 AM
10 points
0 comments1 min readLW link
(www.harmonyintelligence.com)

[Linkpost] State­ment from Scar­lett Jo­hans­son on OpenAI’s use of the “Sky” voice, that was shock­ingly similar to her own voice.

LinchMay 20, 2024, 11:50 PM
31 points
8 comments1 min readLW link
(variety.com)

Some per­spec­tives on the dis­ci­pline of Physics

TahpMay 20, 2024, 6:19 PM
17 points
3 comments13 min readLW link
(quark.rodeo)

[Question] Are there any groupchats for peo­ple work­ing on Rep­re­sen­ta­tion read­ing/​con­trol, ac­ti­va­tion steer­ing type ex­per­i­ments?

Joe KwonMay 20, 2024, 6:03 PM
3 points
1 comment1 min readLW link

In­ter­pretabil­ity: In­te­grated Gra­di­ents is a de­cent at­tri­bu­tion method

May 20, 2024, 5:55 PM
23 points
7 comments6 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

May 20, 2024, 5:53 PM
107 points
4 comments3 min readLW link

NAO Up­dates, Spring 2024

jefftkMay 20, 2024, 4:51 PM
13 points
0 comments6 min readLW link
(naobservatory.org)

OpenAI: Exodus

ZviMay 20, 2024, 1:10 PM
153 points
26 comments44 min readLW link
(thezvi.wordpress.com)

In­fra-Bayesian haggling

hannagaborMay 20, 2024, 12:23 PM
28 points
0 comments20 min readLW link

Jaan Tal­linn’s 2023 Philan­thropy Overview

jaanMay 20, 2024, 12:11 PM
203 points
5 comments1 min readLW link
(jaan.info)

D&D.Sci (Easy Mode): On The Con­struc­tion Of Im­pos­si­ble Struc­tures [Eval­u­a­tion and Rule­set]

abstractapplicMay 20, 2024, 9:38 AM
31 points
2 comments1 min readLW link

Why I find Davi­dad’s plan interesting

Paul WMay 20, 2024, 8:13 AM
18 points
0 comments6 min readLW link

An­thropic: Reflec­tions on our Re­spon­si­ble Scal­ing Policy

Zac Hatfield-DoddsMay 20, 2024, 4:14 AM
30 points
21 comments10 min readLW link
(www.anthropic.com)

The con­sis­tent guess­ing prob­lem is eas­ier than the halt­ing problem

jessicataMay 20, 2024, 4:02 AM
38 points
5 comments4 min readLW link
(unstableontology.com)

A poem ti­tled ‘Tick Tock’.

KrantzMay 20, 2024, 3:52 AM
−1 points
0 comments1 min readLW link

Against Com­put­ers (in­finite play)

rogersbaconMay 20, 2024, 12:43 AM
−11 points
1 comment14 min readLW link
(www.secretorum.life)

Test­ing for par­allel rea­son­ing in LLMs

May 19, 2024, 3:28 PM
9 points
7 comments9 min readLW link

Hot take: The AI safety move­ment is way too sec­tar­ian and this is greatly in­creas­ing p(doom)

O OMay 19, 2024, 2:18 AM
14 points
15 comments2 min readLW link

Some “meta-cruxes” for AI x-risk debates

Aryeh EnglanderMay 19, 2024, 12:21 AM
20 points
2 comments3 min readLW link

On Privilege

ShmiMay 18, 2024, 10:36 PM
15 points
10 comments2 min readLW link

Fund me please—I Work so Hard that my Feet start Bleed­ing and I Need to In­fil­trate University

Johannes C. MayerMay 18, 2024, 7:53 PM
22 points
37 comments6 min readLW link

To Limit Im­pact, Limit KL-Divergence

J BostockMay 18, 2024, 6:52 PM
10 points
1 comment5 min readLW link

[Question] Are There Other Ideas as Gen­er­ally Ap­pli­ca­ble as Nat­u­ral Selection

Amin SennourMay 18, 2024, 4:37 PM
1 point
1 comment1 min readLW link

Scien­tific No­ta­tion Options

jefftkMay 18, 2024, 3:10 PM
27 points
13 comments1 min readLW link
(www.jefftk.com)

“If we go ex­tinct due to mis­al­igned AI, at least na­ture will con­tinue, right? … right?”

plexMay 18, 2024, 2:09 PM
54 points
23 comments2 min readLW link
(aisafety.info)

What Are Non-Zero-Sum Games?—A Primer

James Stephen BrownMay 18, 2024, 9:19 AM
4 points
7 comments3 min readLW link

Deep­Mind’s “​​Fron­tier Safety Frame­work” is weak and unambitious

Zach Stein-PerlmanMay 18, 2024, 3:00 AM
159 points
14 comments4 min readLW link

In­ter­na­tional Scien­tific Re­port on the Safety of Ad­vanced AI: Key Information

Aryeh EnglanderMay 18, 2024, 1:45 AM
39 points
0 comments13 min readLW link

Good­hart in RL with KL: Appendix

Thomas KwaMay 18, 2024, 12:40 AM
12 points
0 comments6 min readLW link

AI 2030 – AI Policy Roadmap

LTMMay 17, 2024, 11:29 PM
8 points
0 comments1 min readLW link

MIT Fu­tureTech are hiring for an Oper­a­tions and Pro­ject Man­age­ment role.

peterslatteryMay 17, 2024, 11:21 PM
2 points
0 comments3 min readLW link

Lan­guage Models Model Us

eggsyntaxMay 17, 2024, 9:00 PM
158 points
55 comments7 min readLW link

Towards Guaran­teed Safe AI: A Frame­work for En­sur­ing Ro­bust and Reli­able AI Systems

Joar SkalseMay 17, 2024, 7:13 PM
67 points
10 comments2 min readLW link

Agency

A*May 17, 2024, 7:11 PM
8 points
0 comments1 min readLW link

Deep­Mind: Fron­tier Safety Framework

Zach Stein-PerlmanMay 17, 2024, 5:30 PM
64 points
0 comments3 min readLW link
(deepmind.google)

Iden­ti­fy­ing Func­tion­ally Im­por­tant Fea­tures with End-to-End Sparse Dic­tionary Learning

May 17, 2024, 4:25 PM
57 points
20 comments4 min readLW link
(arxiv.org)