Notes on the Pres­i­den­tial Elec­tion of 1836

Arjun Panickssery13 Feb 2025 23:40 UTC
23 points
0 comments7 min readLW link
(arjunpanickssery.substack.com)

Static Place AI Makes Agen­tic AI Re­dun­dant: Mul­tiver­sal AI Align­ment & Ra­tional Utopia

ank13 Feb 2025 22:35 UTC
1 point
2 comments11 min readLW link

I’m mak­ing a ttrpg about life in an in­ten­tional com­mu­nity dur­ing the last year be­fore the Sin­gu­lar­ity

bgaesop13 Feb 2025 21:54 UTC
11 points
2 comments2 min readLW link

SWE Au­toma­tion Is Com­ing: Con­sider Sel­ling Your Crypto

A_donor13 Feb 2025 20:17 UTC
12 points
8 comments1 min readLW link

≤10-year Timelines Re­main Un­likely De­spite Deep­Seek and o3

Rafael Harth13 Feb 2025 19:21 UTC
52 points
67 comments15 min readLW link

Sys­tem 2 Alignment

Seth Herd13 Feb 2025 19:17 UTC
35 points
0 comments22 min readLW link

Mur­der plots are infohazards

Chris Monteiro13 Feb 2025 19:15 UTC
311 points
44 comments2 min readLW link

Sparse Au­toen­coder Fea­ture Abla­tion for Unlearning

aludert13 Feb 2025 19:13 UTC
3 points
0 comments11 min readLW link

What is it to solve the al­ign­ment prob­lem?

Joe Carlsmith13 Feb 2025 18:42 UTC
31 points
6 comments19 min readLW link
(joecarlsmith.substack.com)

Self-di­alogue: Do be­hav­iorist re­wards make schem­ing AGIs?

Steven Byrnes13 Feb 2025 18:39 UTC
43 points
1 comment46 min readLW link

How do we solve the al­ign­ment prob­lem?

Joe Carlsmith13 Feb 2025 18:27 UTC
63 points
9 comments7 min readLW link
(joecarlsmith.substack.com)

Am­bigu­ous out-of-dis­tri­bu­tion gen­er­al­iza­tion on an al­gorith­mic task

13 Feb 2025 18:24 UTC
83 points
6 comments11 min readLW link

Teach­ing AI to rea­son: this year’s most im­por­tant story

Benjamin_Todd13 Feb 2025 17:40 UTC
10 points
0 comments10 min readLW link
(benjamintodd.substack.com)

AI #103: Show Me the Money

Zvi13 Feb 2025 15:20 UTC
30 points
9 comments58 min readLW link
(thezvi.wordpress.com)

OpenAI’s NSFW policy: user safety, harm re­duc­tion, and AI consent

8e913 Feb 2025 13:59 UTC
4 points
3 comments2 min readLW link

Stud­ies of Hu­man Er­ror Rate

tin48213 Feb 2025 13:43 UTC
15 points
3 comments1 min readLW link

the dumb­est the­ory of everything

lostinwilliamsburg13 Feb 2025 7:57 UTC
−1 points
0 comments7 min readLW link

Skep­ti­cism to­wards claims about the views of pow­er­ful institutions

tlevin13 Feb 2025 7:40 UTC
46 points
2 comments4 min readLW link

Virtue sig­nal­ing, and the “hu­mans-are-won­der­ful” bias, as a trust exercise

lc13 Feb 2025 6:59 UTC
44 points
16 comments4 min readLW link

My model of what is go­ing on with LLMs

Cole Wyeth13 Feb 2025 3:43 UTC
110 points
49 comments7 min readLW link

Not all ca­pa­bil­ities will be cre­ated equal: fo­cus on strate­gi­cally su­per­hu­man agents

benwr13 Feb 2025 1:24 UTC
62 points
9 comments3 min readLW link

LLMs can teach them­selves to bet­ter pre­dict the future

Ben Turtel13 Feb 2025 1:01 UTC
0 points
1 comment1 min readLW link
(arxiv.org)

Dove­tail’s agent foun­da­tions fel­low­ship talks & discussion

Alex_Altair13 Feb 2025 0:49 UTC
10 points
0 comments1 min readLW link

Ex­tended anal­ogy be­tween hu­mans, cor­po­ra­tions, and AIs.

Daniel Kokotajlo13 Feb 2025 0:03 UTC
36 points
2 comments6 min readLW link

Mo­ral Hazard in Demo­cratic Voting

lsusr12 Feb 2025 23:17 UTC
20 points
8 comments1 min readLW link

MATS Spring 2024 Ex­ten­sion Retrospective

12 Feb 2025 22:43 UTC
26 points
1 comment15 min readLW link

Hunt­ing for AI Hack­ers: LLM Agent Honeypot

12 Feb 2025 20:29 UTC
35 points
0 comments5 min readLW link
(www.apartresearch.com)

Prob­a­bil­ity of AI-Caused Disaster

Alvin Ånestrand12 Feb 2025 19:40 UTC
2 points
2 comments10 min readLW link
(forecastingaifutures.substack.com)

Two flaws in the Machi­avelli Benchmark

TheManxLoiner12 Feb 2025 19:34 UTC
24 points
0 comments3 min readLW link

Gra­di­ent Anatomy’s—Hal­lu­ci­na­tion Ro­bust­ness in Med­i­cal Q&A

DieSab12 Feb 2025 19:16 UTC
2 points
0 comments10 min readLW link

Are cur­rent LLMs safe for psy­chother­apy?

CanYouFeelTheBenefits12 Feb 2025 19:16 UTC
5 points
4 comments1 min readLW link

Com­par­ing the effec­tive­ness of top-down and bot­tom-up ac­ti­va­tion steer­ing for by­pass­ing re­fusal on harm­ful prompts

Ana Kapros12 Feb 2025 19:12 UTC
7 points
0 comments5 min readLW link

The Paris AI Anti-Safety Summit

Zvi12 Feb 2025 14:00 UTC
129 points
21 comments21 min readLW link
(thezvi.wordpress.com)

In­side the dark forests of the internet

Itay Dreyfus12 Feb 2025 10:20 UTC
10 points
0 comments6 min readLW link
(productidentity.co)

Utility Eng­ineer­ing: An­a­lyz­ing and Con­trol­ling Emer­gent Value Sys­tems in AIs

Matrice Jacobine12 Feb 2025 9:15 UTC
53 points
49 comments1 min readLW link
(www.emergent-values.ai)

Why you maybe should lift weights, and How to.

samusasuke12 Feb 2025 5:15 UTC
33 points
30 comments9 min readLW link

[Question] how do the CEOs re­spond to our con­cerns?

KvmanThinking11 Feb 2025 23:39 UTC
−10 points
7 comments1 min readLW link

Where Would Good Fore­casts Most Help AI Gover­nance Efforts?

Violet Hour11 Feb 2025 18:15 UTC
11 points
1 comment6 min readLW link

AI Safety at the Fron­tier: Paper High­lights, Jan­uary ’25

gasteigerjo11 Feb 2025 16:14 UTC
7 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

If Neu­ro­scien­tists Succeed

Mordechai Rorvig11 Feb 2025 15:33 UTC
9 points
6 comments18 min readLW link

The News is Never Neglected

lsusr11 Feb 2025 14:59 UTC
113 points
18 comments1 min readLW link

Re­think­ing AI Safety Ap­proach in the Era of Open-Source AI

Weibing Wang11 Feb 2025 14:01 UTC
4 points
0 comments6 min readLW link

What About The Horses?

Maxwell Tabarrok11 Feb 2025 13:59 UTC
15 points
17 comments7 min readLW link
(www.maximum-progress.com)

On De­liber­a­tive Alignment

Zvi11 Feb 2025 13:00 UTC
53 points
1 comment6 min readLW link
(thezvi.wordpress.com)

De­tect­ing AI Agent Failure Modes in Simulations

Michael Soareverix11 Feb 2025 11:10 UTC
17 points
0 comments8 min readLW link

World Ci­ti­zen Assem­bly about AI—Announcement

Camille Berger 11 Feb 2025 10:51 UTC
26 points
1 comment5 min readLW link

Vi­sual Refer­ence for Fron­tier Large Lan­guage Models

kenakofer11 Feb 2025 5:14 UTC
14 points
0 comments1 min readLW link
(kenan.schaefkofer.com)

Ra­tional Effec­tive Utopia & Nar­row Way There: Math-Proven Safe Static Mul­tiver­sal mAX-In­tel­li­gence (AXI), Mul­tiver­sal Align­ment, New Ethico­physics… (Aug 11)

ank11 Feb 2025 3:21 UTC
13 points
8 comments38 min readLW link

Ar­gu­ing for the Truth? An In­fer­ence-Only Study into AI Debate

denisemester11 Feb 2025 3:04 UTC
7 points
0 comments16 min readLW link

Why Did Elon Musk Just Offer to Buy Con­trol of OpenAI for $100 Billion?

garrison11 Feb 2025 0:20 UTC
208 points
8 comments6 min readLW link
(garrisonlovely.substack.com)