Mo­ral Hazard in Demo­cratic Voting

lsusrFeb 12, 2025, 11:17 PM
20 points
8 comments1 min readLW link

MATS Spring 2024 Ex­ten­sion Retrospective

Feb 12, 2025, 10:43 PM
26 points
1 comment15 min readLW link

Hunt­ing for AI Hack­ers: LLM Agent Honeypot

Feb 12, 2025, 8:29 PM
34 points
0 comments5 min readLW link
(www.apartresearch.com)

Prob­a­bil­ity of AI-Caused Disaster

Alvin ÅnestrandFeb 12, 2025, 7:40 PM
2 points
2 comments10 min readLW link
(forecastingaifutures.substack.com)

Two flaws in the Machi­avelli Benchmark

TheManxLoinerFeb 12, 2025, 7:34 PM
23 points
0 comments3 min readLW link

Gra­di­ent Anatomy’s—Hal­lu­ci­na­tion Ro­bust­ness in Med­i­cal Q&A

DieSabFeb 12, 2025, 7:16 PM
2 points
0 comments10 min readLW link

Are cur­rent LLMs safe for psy­chother­apy?

PaperBikeFeb 12, 2025, 7:16 PM
5 points
4 comments1 min readLW link

Com­par­ing the effec­tive­ness of top-down and bot­tom-up ac­ti­va­tion steer­ing for by­pass­ing re­fusal on harm­ful prompts

Ana KaprosFeb 12, 2025, 7:12 PM
7 points
0 comments5 min readLW link

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM
129 points
21 comments21 min readLW link
(thezvi.wordpress.com)

In­side the dark forests of the internet

Itay DreyfusFeb 12, 2025, 10:20 AM
10 points
0 comments6 min readLW link
(productidentity.co)

Utility Eng­ineer­ing: An­a­lyz­ing and Con­trol­ling Emer­gent Value Sys­tems in AIs

Matrice JacobineFeb 12, 2025, 9:15 AM
53 points
49 commentsLW link
(www.emergent-values.ai)

Why you maybe should lift weights, and How to.

samusasukeFeb 12, 2025, 5:15 AM
32 points
29 comments9 min readLW link

[Question] how do the CEOs re­spond to our con­cerns?

KvmanThinkingFeb 11, 2025, 11:39 PM
−10 points
7 comments1 min readLW link

Where Would Good Fore­casts Most Help AI Gover­nance Efforts?

Violet HourFeb 11, 2025, 6:15 PM
11 points
1 comment6 min readLW link

AI Safety at the Fron­tier: Paper High­lights, Jan­uary ’25

gasteigerjoFeb 11, 2025, 4:14 PM
7 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

If Neu­ro­scien­tists Succeed

Mordechai RorvigFeb 11, 2025, 3:33 PM
9 points
6 comments18 min readLW link

The News is Never Neglected

lsusrFeb 11, 2025, 2:59 PM
112 points
18 comments1 min readLW link

Re­think­ing AI Safety Ap­proach in the Era of Open-Source AI

Weibing WangFeb 11, 2025, 2:01 PM
4 points
0 comments6 min readLW link

What About The Horses?

Maxwell TabarrokFeb 11, 2025, 1:59 PM
15 points
17 comments7 min readLW link
(www.maximum-progress.com)

On De­liber­a­tive Alignment

ZviFeb 11, 2025, 1:00 PM
53 points
1 comment6 min readLW link
(thezvi.wordpress.com)

De­tect­ing AI Agent Failure Modes in Simulations

Michael SoareverixFeb 11, 2025, 11:10 AM
17 points
0 comments8 min readLW link

World Ci­ti­zen Assem­bly about AI—Announcement

Camille Berger Feb 11, 2025, 10:51 AM
26 points
1 commentLW link

Vi­sual Refer­ence for Fron­tier Large Lan­guage Models

kenakoferFeb 11, 2025, 5:14 AM
14 points
0 comments1 min readLW link
(kenan.schaefkofer.com)

Ra­tional Effec­tive Utopia & Nar­row Way There: Mul­tiver­sal AI Align­ment, Place AI, New Ethico­physics… (Up­dated)

ankFeb 11, 2025, 3:21 AM
13 points
8 comments35 min readLW link

Ar­gu­ing for the Truth? An In­fer­ence-Only Study into AI Debate

denisemesterFeb 11, 2025, 3:04 AM
7 points
0 comments16 min readLW link

Why Did Elon Musk Just Offer to Buy Con­trol of OpenAI for $100 Billion?

garrisonFeb 11, 2025, 12:20 AM
208 points
8 commentsLW link
(garrisonlovely.substack.com)

Pos­i­tive Directions

G WoodFeb 11, 2025, 12:00 AM
0 points
0 comments4 min readLW link

Log­i­cal Correlation

niplavFeb 10, 2025, 11:29 PM
24 points
7 comments10 min readLW link

Proof idea: SLT to AIT

Lucius BushnaqFeb 10, 2025, 11:14 PM
40 points
15 comments6 min readLW link

LW/​ACX so­cial meetup

StefanFeb 10, 2025, 9:12 PM
2 points
0 comments1 min readLW link

A Bear­ish Take on AI, as a Treat

ratsFeb 10, 2025, 7:22 PM
11 points
0 comments4 min readLW link
(open.substack.com)

Beyond ELO: Re­think­ing Chess Skill as a Mul­tidi­men­sional Ran­dom Variable

Oliver OswaldFeb 10, 2025, 7:19 PM
6 points
7 comments2 min readLW link

Claude is More Anx­ious than GPT; Per­son­al­ity is an axis of in­ter­pretabil­ity in lan­guage models

future_detectiveFeb 10, 2025, 7:19 PM
2 points
2 comments8 min readLW link
(dhealy.substack.com)

Notes on Oc­cam via Solomonoff vs. hi­er­ar­chi­cal Bayes

JesseCliftonFeb 10, 2025, 5:55 PM
29 points
7 comments4 min readLW link

Sleep­ing Beauty: an Ac­cu­racy-based Approach

glauberdebonaFeb 10, 2025, 3:40 PM
7 points
2 comments7 min readLW link

Poli­ti­cal Idolatry

Arturo MaciasFeb 10, 2025, 3:26 PM
−8 points
7 comments2 min readLW link

ML4Good Colom­bia—Ap­pli­ca­tions Open to LatAm Participants

Feb 10, 2025, 3:03 PM
4 points
0 comments1 min readLW link

Non­par­ti­san AI safety

Yair HalberstadtFeb 10, 2025, 2:55 PM
30 points
4 comments2 min readLW link

Opinion Ar­ti­cle Scor­ing System

ciaran Feb 10, 2025, 2:32 PM
1 point
0 comments5 min readLW link

Levels of Friction

ZviFeb 10, 2025, 1:10 PM
149 points
8 comments12 min readLW link
(thezvi.wordpress.com)

Bau­mol effect vs Jevons paradox

HznFeb 10, 2025, 8:28 AM
0 points
0 comments1 min readLW link
(hzn33.neocities.org)

[Question] A Si­mu­la­tion of Au­toma­tion eco­nomics?

qbolecFeb 10, 2025, 8:11 AM
10 points
1 comment1 min readLW link

[Question] Should I Divest from AI?

OKlogicFeb 10, 2025, 3:29 AM
6 points
4 comments1 min readLW link

OpenAI lied about SFT vs. RLHF

sanxiynFeb 10, 2025, 3:24 AM
10 points
2 comments1 min readLW link
(x.com)

“Self-Black­mail” and Alternatives

jessicata9 Feb 2025 23:20 UTC
19 points
12 comments7 min readLW link
(unstableontology.com)

Alt­man blog on post-AGI world

Julian Bradshaw9 Feb 2025 21:52 UTC
29 points
10 comments1 min readLW link
(blog.samaltman.com)

Fore­cast­ing newslet­ter #2/​2025: Fore­cast­ing meetup network

NunoSempere9 Feb 2025 18:07 UTC
13 points
0 comments4 min readLW link
(forecasting.substack.com)

How iden­ti­cal twin sisters feel about nieces vs their own daughters

Dave Lindbergh9 Feb 2025 17:36 UTC
4 points
19 comments1 min readLW link

Two hemi­spheres—I do not think it means what you think it means

Viliam9 Feb 2025 15:33 UTC
109 points
21 comments14 min readLW link

The Struc­ture of Pro­fes­sional Revolutions

SebastianG 9 Feb 2025 13:23 UTC
8 points
0 comments4 min readLW link