Ex­er­cise: Plan­mak­ing, Sur­prise An­ti­ci­pa­tion, and “Baba is You”

RaemonFeb 24, 2024, 8:33 PM
67 points
31 comments6 min readLW link

Most ex­perts be­lieve COVID-19 was prob­a­bly not a lab leak

DanielFilanFeb 2, 2024, 7:28 PM
66 points
89 comments2 min readLW link
(gcrinstitute.org)

Self-Aware­ness: Tax­on­omy and eval suite proposal

Daniel KokotajloFeb 17, 2024, 1:47 AM
65 points
2 comments11 min readLW link

On the De­bate Between Je­zos and Leahy

ZviFeb 6, 2024, 2:40 PM
64 points
6 comments63 min readLW link
(thezvi.wordpress.com)

Manag­ing risks while try­ing to do good

Wei DaiFeb 1, 2024, 6:08 PM
63 points
26 commentsLW link

On co­in­ci­dences and Bayesian rea­son­ing, as ap­plied to the ori­gins of COVID-19

viking_mathFeb 19, 2024, 1:14 AM
62 points
28 comments14 min readLW link

Balanc­ing Games

jefftkFeb 24, 2024, 2:40 PM
62 points
18 comments1 min readLW link
(www.jefftk.com)

Offer­ing AI safety sup­port calls for ML professionals

Vael GatesFeb 15, 2024, 11:48 PM
61 points
1 commentLW link

[Question] What’s the the­ory of im­pact for ac­ti­va­tion vec­tors?

Chris_LeongFeb 11, 2024, 7:34 AM
61 points
12 comments1 min readLW link

Notic­ing Panic

Cole WyethFeb 5, 2024, 3:45 AM
59 points
8 comments3 min readLW link

Act­ing Wholesomely

owencbFeb 26, 2024, 9:49 PM
59 points
64 commentsLW link

The Sense Of Phys­i­cal Ne­ces­sity: A Nat­u­ral­ism Demo (In­tro­duc­tion)

LoganStrohlFeb 24, 2024, 2:56 AM
59 points
1 comment6 min readLW link

Vot­ing Re­sults for the 2022 Review

Ben PaceFeb 2, 2024, 8:34 PM
57 points
3 comments73 min readLW link

Dual Wield­ing Kin­dle Scribes

mesaoptimizerFeb 21, 2024, 5:17 PM
57 points
18 comments6 min readLW link

Skep­ti­cism About Deep­Mind’s “Grand­mas­ter-Level” Chess Without Search

Arjun PanicksseryFeb 12, 2024, 12:56 AM
57 points
13 comments3 min readLW link

Eval­u­at­ing Sta­bil­ity of Un­re­flec­tive Alignment

james.lucassenFeb 1, 2024, 10:15 PM
57 points
12 comments18 min readLW link
(jlucassen.com)

Phal­lo­cen­tric­ity in GPT-J’s bizarre strat­ified ontology

mwatkinsFeb 17, 2024, 12:16 AM
56 points
37 comments9 min readLW link

Con­di­tional pre­dic­tion mar­kets are ev­i­den­tial, not causal

philhFeb 7, 2024, 9:52 PM
55 points
10 comments2 min readLW link

How do you ac­tu­ally ob­tain and re­port a like­li­hood func­tion for sci­en­tific re­search?

Peter BerggrenFeb 11, 2024, 5:42 PM
55 points
4 comments1 min readLW link

Co­op­er­at­ing with aliens and AGIs: An ECL explainer

Feb 24, 2024, 10:58 PM
55 points
8 commentsLW link

Why I no longer iden­tify as transhumanist

Kaj_SotalaFeb 3, 2024, 12:00 PM
55 points
33 comments3 min readLW link
(kajsotala.fi)

Safe Sta­sis Fallacy

DavidmanheimFeb 5, 2024, 10:54 AM
54 points
2 commentsLW link

The Shut­down Prob­lem: In­com­plete Prefer­ences as a Solution

EJTFeb 23, 2024, 4:01 PM
53 points
33 comments42 min readLW link

AI #50: The Most Danger­ous Thing

ZviFeb 8, 2024, 2:30 PM
53 points
4 comments24 min readLW link
(thezvi.wordpress.com)

[Question] Can we get an AI to “do our al­ign­ment home­work for us”?

Chris_LeongFeb 26, 2024, 7:56 AM
53 points
33 comments1 min readLW link

Com­plex­ity of value but not dis­value im­plies more fo­cus on s-risk. Mo­ral un­cer­tainty and prefer­ence util­i­tar­i­anism also do.

Chi NguyenFeb 23, 2024, 6:10 AM
52 points
18 commentsLW link

Toy mod­els of AI con­trol for con­cen­trated catas­tro­phe prevention

Feb 6, 2024, 1:38 AM
51 points
2 comments7 min readLW link

AI #52: Oops

ZviFeb 22, 2024, 9:50 PM
50 points
9 comments29 min readLW link
(thezvi.wordpress.com)

Trans­fer learn­ing and gen­er­al­iza­tion-qua-ca­pa­bil­ity in Bab­bage and Davinci (or, why di­vi­sion is bet­ter than Span­ish)

RP and agg
Feb 9, 2024, 7:00 AM
50 points
6 comments3 min readLW link

Notes on con­trol eval­u­a­tions for safety cases

Feb 28, 2024, 4:15 PM
49 points
0 comments32 min readLW link

Cri­tiques of the AI con­trol agenda

JozdienFeb 14, 2024, 7:25 PM
48 points
14 comments9 min readLW link

Soft Prompts for Eval­u­a­tion: Mea­sur­ing Con­di­tional Dis­tance of Capabilities

porbyFeb 2, 2024, 5:49 AM
47 points
1 comment4 min readLW link
(arxiv.org)

What does davi­dad want from «bound­aries»?

Feb 6, 2024, 5:45 PM
47 points
1 comment5 min readLW link

I’d also take $7 trillion

bhauthFeb 19, 2024, 3:31 AM
47 points
12 comments10 min readLW link
(www.bhauth.com)

Value learn­ing in the ab­sence of ground truth

Joel_SaarinenFeb 5, 2024, 6:56 PM
47 points
8 comments45 min readLW link

Sora What

ZviFeb 22, 2024, 6:10 PM
47 points
3 comments9 min readLW link
(thezvi.wordpress.com)

Fluent dream­ing for lan­guage mod­els (AI in­ter­pretabil­ity method)

Feb 6, 2024, 6:02 AM
46 points
5 comments1 min readLW link
(arxiv.org)

On the Pro­posed Cal­ifor­nia SB 1047

ZviFeb 12, 2024, 4:40 PM
46 points
18 comments12 min readLW link
(thezvi.wordpress.com)

Thoughts on “The Offense-Defense Balance Rarely Changes”

CullenFeb 12, 2024, 3:26 AM
46 points
4 commentsLW link

[Question] Where is the Town Square?

Gretta DulebaFeb 13, 2024, 3:53 AM
46 points
8 comments1 min readLW link

The Gem­ini In­ci­dent Continues

ZviFeb 27, 2024, 4:00 PM
45 points
6 comments48 min readLW link
(thezvi.wordpress.com)

A start­ing point for mak­ing sense of task struc­ture (in ma­chine learn­ing)

Feb 24, 2024, 1:51 AM
45 points
2 comments12 min readLW link

Why does gen­er­al­iza­tion work?

Martín SotoFeb 20, 2024, 5:51 PM
43 points
16 comments4 min readLW link

Job List­ing: Manag­ing Edi­tor /​ Writer

Gretta DulebaFeb 21, 2024, 11:41 PM
43 points
2 comments1 min readLW link

Ex­am­in­ing Lan­guage Model Perfor­mance with Re­con­structed Ac­ti­va­tions us­ing Sparse Au­toen­coders

Feb 27, 2024, 2:43 AM
43 points
16 comments15 min readLW link

Pro­to­col eval­u­a­tions: good analo­gies vs control

Fabien RogerFeb 19, 2024, 6:00 PM
42 points
10 comments11 min readLW link

Ev­i­den­tial Co­op­er­a­tion in Large Wor­lds: Po­ten­tial Ob­jec­tions & FAQ

Feb 28, 2024, 6:58 PM
42 points
5 commentsLW link

Deep and ob­vi­ous points in the gap be­tween your thoughts and your pic­tures of thought

KatjaGraceFeb 23, 2024, 7:30 AM
42 points
6 comments1 min readLW link
(worldspiritsockpuppet.com)

How I in­ter­nal­ized my achieve­ments to bet­ter deal with nega­tive feelings

Raymond KoopmanschapFeb 27, 2024, 3:10 PM
42 points
7 comments6 min readLW link

Whole­some­ness and Effec­tive Altruism

owencbFeb 28, 2024, 8:28 PM
42 points
3 commentsLW link