Meta-Tech­ni­cal­ities: Safe­guard­ing Values in For­mal Systems

LTMApr 30, 2025, 11:43 PM
2 points
0 comments3 min readLW link
(routecause.substack.com)

Ob­sta­cles in ARC’s agenda: Find­ing explanations

David MatolcsiApr 30, 2025, 11:03 PM
122 points
10 comments17 min readLW link

GPT-4o Re­sponds to Nega­tive Feedback

ZviApr 30, 2025, 8:20 PM
45 points
2 comments18 min readLW link
(thezvi.wordpress.com)

State of play of AI progress (and re­lated brakes on an in­tel­li­gence ex­plo­sion) [Linkpost]

Noosphere89Apr 30, 2025, 7:58 PM
7 points
0 comments5 min readLW link
(www.interconnects.ai)

Don’t ac­cuse your in­ter­locu­tor of be­ing in­suffi­ciently truth-seeking

TFDApr 30, 2025, 7:38 PM
30 points
15 comments2 min readLW link
(www.thefloatingdroid.com)

How can we solve diffuse threats like re­search sab­o­tage with AI con­trol?

Vivek HebbarApr 30, 2025, 7:23 PM
52 points
1 comment8 min readLW link

[Question] Can Nar­row­ing One’s Refer­ence Class Un­der­mine the Dooms­day Ar­gu­ment?

Iannoose n.Apr 30, 2025, 6:24 PM
2 points
1 comment1 min readLW link

[Question] Does there ex­ist an in­ter­ac­tive rea­son­ing map tool that lets users vi­su­ally lay out claims, as­sign prob­a­bil­ities and con­fi­dence lev­els, and dy­nam­i­cally ad­just their be­liefs based on weighted in­fluences be­tween con­nected as­ser­tions?

Zack FriedmanApr 30, 2025, 6:22 PM
5 points
4 comments1 min readLW link

Distill­ing the In­ter­nal Model Prin­ci­ple part II

JoseFaustinoApr 30, 2025, 5:56 PM
15 points
0 comments19 min readLW link

Re­search Pri­ori­ties for Hard­ware-En­abled Mechanisms (HEMs)

aogApr 30, 2025, 5:43 PM
17 points
2 comments15 min readLW link
(www.longview.org)

Video and tran­script of talk on au­tomat­ing al­ign­ment research

Joe CarlsmithApr 30, 2025, 5:43 PM
21 points
0 comments24 min readLW link
(joecarlsmith.com)

Can we safely au­to­mate al­ign­ment re­search?

Joe CarlsmithApr 30, 2025, 5:37 PM
54 points
29 comments48 min readLW link
(joecarlsmith.com)

In­ves­ti­gat­ing task-spe­cific prompts and sparse au­toen­coders for ac­ti­va­tion monitoring

Henk TillmanApr 30, 2025, 5:09 PM
23 points
0 comments1 min readLW link
(arxiv.org)

Euro­pean Links (30.04.25)

Martin SustrikApr 30, 2025, 3:40 PM
15 points
1 comment8 min readLW link
(250bpm.substack.com)

Scal­ing Laws for Scal­able Oversight

Apr 30, 2025, 12:13 PM
30 points
0 comments9 min readLW link

Early Chi­nese Lan­guage Me­dia Cover­age of the AI 2027 Re­port: A Qual­i­ta­tive Analysis

Apr 30, 2025, 11:06 AM
211 points
11 comments11 min readLW link

[Paper] Au­to­mated Fea­ture La­bel­ing with To­ken-Space Gra­di­ent Descent

Wuschel SchulzApr 30, 2025, 10:22 AM
4 points
0 comments4 min readLW link

A sin­gle prin­ci­ple re­lated to many Align­ment sub­prob­lems?

Q HomeApr 30, 2025, 9:49 AM
37 points
33 comments17 min readLW link

What if Brain Com­puter In­ter­faces went ex­po­nen­tial?

Stephen MartinApr 30, 2025, 5:07 AM
−1 points
0 comments12 min readLW link

In­ter­pret­ing the METR Time Hori­zons Post

snewmanApr 30, 2025, 3:03 AM
66 points
12 comments10 min readLW link
(amistrongeryet.substack.com)

Should we ex­pect the fu­ture to be good?

Neil CrawfordApr 30, 2025, 12:36 AM
15 points
0 comments14 min readLW link

Judg­ing types of con­se­quen­tial­ism by in­fluence and normativity

Cole WyethApr 29, 2025, 11:25 PM
20 points
1 comment2 min readLW link

Band­width Rules Every­thing Around Me: Oliver Habryka on OpenPhil and GoodVentures

ElizabethApr 29, 2025, 8:40 PM
79 points
15 comments1 min readLW link
(acesounderglass.com)

The Grand En­cy­clo­pe­dia of Epony­mous Laws

rogersbaconApr 29, 2025, 7:30 PM
27 points
5 comments16 min readLW link
(www.secretorum.life)

Mis­rep­re­sen­ta­tion as a Bar­rier for In­terp (Part I)

Apr 29, 2025, 5:07 PM
113 points
11 comments7 min readLW link

AISN #53: An Open Let­ter At­tempts to Block OpenAI Restructuring

Apr 29, 2025, 4:13 PM
5 points
0 comments4 min readLW link

What could Alphafold 4 look like?

Abhishaike MahajanApr 29, 2025, 3:45 PM
8 points
0 comments1 min readLW link

Sealed Com­pu­ta­tion: Towards Low-Fric­tion Proof of Locality

Paul BricmanApr 29, 2025, 3:26 PM
4 points
0 comments10 min readLW link
(noemaresearch.com)

Dat­ing Roundup #4: An App for That

ZviApr 29, 2025, 1:10 PM
17 points
5 comments16 min readLW link
(thezvi.wordpress.com)

Talk on let­ters to AI (Lon­don)

ukc10014Apr 29, 2025, 9:50 AM
3 points
0 comments1 min readLW link

Me­mory De­cod­ing Jour­nal Club: “Mo­tor learn­ing se­lec­tively strength­ens cor­ti­cal and stri­atal synapses of mo­tor en­gram neu­rons”

Devin WardApr 29, 2025, 2:26 AM
1 point
0 comments1 min readLW link

D&D.Sci Tax Day: Ad­ven­tur­ers and Assess­ments Eval­u­a­tion & Ruleset

aphyerApr 29, 2025, 2:00 AM
28 points
10 comments5 min readLW link

How to Build a Third Place on Focusmate

Parker ConleyApr 28, 2025, 11:46 PM
96 points
10 comments5 min readLW link
(parconley.com)

Meth­ods of defense against AGI manipulation

MarkelKoriApr 28, 2025, 9:03 PM
1 point
0 comments2 min readLW link

China’s Pe­ti­tion Sys­tem: It Looks Like Democ­racy — But It Isn’t

Hu YichaoApr 28, 2025, 8:56 PM
0 points
4 comments2 min readLW link

Fun­da­men­tals of Safe AI (Phase 1) – Ap­pli­ca­tions Open for the Global Co­hort

rajsecretsApr 28, 2025, 8:52 PM
9 points
0 comments2 min readLW link

Pro­ceed­ings of ILIAD: Les­sons and Progress

Apr 28, 2025, 7:04 PM
77 points
5 comments8 min readLW link

GPT-4o Is An Ab­surd Sycophant

ZviApr 28, 2025, 7:00 PM
80 points
7 comments19 min readLW link
(thezvi.wordpress.com)

[Question] What are the best stan­dard­ised, re­peat­able bets?

kaveApr 28, 2025, 6:45 PM
31 points
10 comments1 min readLW link

7+ tractable di­rec­tions in AI control

Apr 28, 2025, 5:12 PM
86 points
1 comment13 min readLW link

“A vic­tory for the nat­u­ral or­der”

Mati_RoyApr 28, 2025, 3:33 PM
11 points
3 comments1 min readLW link
(preservinghope.substack.com)

Why giv­ing work­ers stocks isn’t enough — and what co-ops get right

B JacobsApr 28, 2025, 2:19 PM
6 points
9 comments2 min readLW link
(bobjacobs.substack.com)

Keltham on Be­com­ing more Truth-Oriented

Towards_KeeperhoodApr 28, 2025, 12:58 PM
15 points
2 comments19 min readLW link

Ther­a­pist in the Weights: Risks of Hyper-In­tro­spec­tion in Fu­ture AI Systems

DavidmanheimApr 28, 2025, 6:42 AM
15 points
1 comment5 min readLW link

In Dark­ness They Assembled

Charlie SandersApr 28, 2025, 3:44 AM
2 points
0 comments3 min readLW link

Seek­ing ad­vice on ca­reers in AI Safety

nemApr 27, 2025, 11:59 PM
8 points
2 comments1 min readLW link

Thin Align­ment Can’t Solve Thick Problems

Daan HenselmansApr 27, 2025, 10:42 PM
11 points
2 comments9 min readLW link

The Way You Go Depends A Good Deal On Where You Want To Get: FEP min­i­mizes sur­prise about ac­tions us­ing prefer­ences about the fu­ture as *ev­i­dence*

Christopher KingApr 27, 2025, 9:55 PM
9 points
5 comments5 min readLW link

How peo­ple use LLMs

ElizabethApr 27, 2025, 9:48 PM
78 points
6 comments1 min readLW link
(www.gleech.org)

Луна Лавгуд и Комната Тайн, Часть 6

Apr 27, 2025, 8:26 PM
3 points
0 comments2 min readLW link