What I am work­ing on right now and why: rep­re­sen­ta­tion en­g­ineer­ing edition

Lukasz G Bartoszcze18 Mar 2025 22:37 UTC
3 points
0 comments3 min readLW link

Boots the­ory and Sy­bil Ramkin

philh18 Mar 2025 22:10 UTC
37 points
18 comments11 min readLW link
(reasonableapproximation.net)

Sch­midt Sciences Tech­ni­cal AI Safety RFP on In­fer­ence-Time Com­pute – Dead­line: April 30

Ryan Gajarawala18 Mar 2025 18:05 UTC
18 points
0 comments2 min readLW link
(www.schmidtsciences.org)

PRISM: Per­spec­tive Rea­son­ing for In­te­grated Syn­the­sis and Me­di­a­tion (In­ter­ac­tive Demo)

Anthony Diamond18 Mar 2025 18:03 UTC
10 points
2 comments1 min readLW link

Sub­space Rer­out­ing: Us­ing Mechanis­tic In­ter­pretabil­ity to Craft Ad­ver­sar­ial At­tacks against Large Lan­guage Models

Le magicien quantique18 Mar 2025 17:55 UTC
6 points
1 comment10 min readLW link

Progress links and short notes, 2025-03-18

jasoncrawford18 Mar 2025 17:14 UTC
8 points
0 comments3 min readLW link
(newsletter.rootsofprogress.org)

The Con­ver­gent Path to the Stars

Maxime Riché18 Mar 2025 17:09 UTC
6 points
0 comments20 min readLW link

Sapir-Whorf Ego Death

Jonathan Moregård18 Mar 2025 16:57 UTC
8 points
7 comments2 min readLW link
(honestliving.substack.com)

Smel­ling Nice is Good, Actually

Gordon Seidoh Worley18 Mar 2025 16:54 UTC
28 points
8 comments3 min readLW link
(uncertainupdates.substack.com)

A Tax­on­omy of Jobs Deeply Re­sis­tant to TAI Automation

Deric Cheng18 Mar 2025 16:25 UTC
9 points
0 comments12 min readLW link
(www.convergenceanalysis.org)

Why Are The Hu­man Sciences Hard? Two New Hypotheses

18 Mar 2025 15:45 UTC
39 points
14 comments9 min readLW link

Go home GPT-4o, you’re drunk: emer­gent mis­al­ign­ment as low­ered inhibitions

18 Mar 2025 14:48 UTC
80 points
12 comments5 min readLW link

[Question] What is the the­ory of change be­hind writ­ing pa­pers about AI safety?

Kajus18 Mar 2025 12:51 UTC
7 points
1 comment1 min readLW link

OpenAI #11: Amer­ica Ac­tion Plan

Zvi18 Mar 2025 12:50 UTC
83 points
3 comments6 min readLW link
(thezvi.wordpress.com)

I changed my mind about orca intelligence

Towards_Keeperhood18 Mar 2025 10:15 UTC
54 points
24 comments5 min readLW link

[Question] Is Peano ar­ith­metic try­ing to kill us? Do we care?

Q Home18 Mar 2025 8:22 UTC
17 points
2 comments2 min readLW link

Do What the Mam­mals Do

CrimsonChin18 Mar 2025 3:57 UTC
2 points
6 comments4 min readLW link

What Ac­tu­ally Mat­ters Un­til We Reach the Singularity

Lexius18 Mar 2025 2:17 UTC
−1 points
0 comments9 min readLW link

Mean­ing as a cog­ni­tive sub­sti­tute for sur­vival in­stincts: A thought experiment

Ovidijus Šimkus18 Mar 2025 1:53 UTC
0 points
0 comments2 min readLW link

Against Yud­kowsky’s evolu­tion anal­ogy for AI x-risk [un­finished]

Fiora Sunshine18 Mar 2025 1:41 UTC
52 points
18 comments11 min readLW link

An “AI re­searcher” has writ­ten a pa­per on op­ti­miz­ing AI ar­chi­tec­ture and op­ti­mized a lan­guage model to sev­eral or­ders of mag­ni­tude more effi­ciency.

Y B18 Mar 2025 1:15 UTC
3 points
1 comment1 min readLW link

LessOn­line 2025: Early Bird Tick­ets On Sale

Ben Pace18 Mar 2025 0:22 UTC
37 points
5 comments5 min readLW link

Feed­back loops for ex­er­cise (VO2Max)

Elizabeth18 Mar 2025 0:10 UTC
65 points
13 comments8 min readLW link
(acesounderglass.com)

Fron­tierMath Score of o3-mini Much Lower Than Claimed

YafahEdelman17 Mar 2025 22:41 UTC
61 points
7 comments1 min readLW link

Proof-of-Con­cept De­bug­ger for a Small LLM

17 Mar 2025 22:27 UTC
27 points
0 comments11 min readLW link

Effec­tively Com­mu­ni­cat­ing with DC Policymakers

PolicyTakes17 Mar 2025 22:11 UTC
14 points
0 comments2 min readLW link

EIS XV: A New Proof of Con­cept for Use­ful Interpretability

scasper17 Mar 2025 20:05 UTC
30 points
2 comments3 min readLW link

Sen­tinel’s Global Risks Weekly Roundup #11/​2025. Trump in­vokes Alien Ene­mies Act, Chi­nese in­va­sion barges de­ployed in ex­er­cise.

NunoSempere17 Mar 2025 19:34 UTC
59 points
3 comments6 min readLW link
(blog.sentinel-team.org)

Claude Son­net 3.7 (of­ten) knows when it’s in al­ign­ment evaluations

17 Mar 2025 19:11 UTC
188 points
9 comments6 min readLW link

Three Types of In­tel­li­gence Explosion

17 Mar 2025 14:47 UTC
40 points
8 comments3 min readLW link
(www.forethought.org)

An Ad­vent of Thought

Kaarel17 Mar 2025 14:21 UTC
57 points
13 comments48 min readLW link

In­ter­ested in work­ing from a new Bos­ton AI Safety Hub?

17 Mar 2025 13:42 UTC
17 points
0 comments2 min readLW link

Other Civ­i­liza­tions Would Re­cover 84+% of Our Cos­mic Re­sources—A Challenge to Ex­tinc­tion Risk Prioritization

Maxime Riché17 Mar 2025 13:12 UTC
5 points
0 comments12 min readLW link

Monthly Roundup #28: March 2025

Zvi17 Mar 2025 12:50 UTC
31 points
8 comments14 min readLW link
(thezvi.wordpress.com)

Are cor­po­ra­tions su­per­in­tel­li­gent?

17 Mar 2025 10:36 UTC
3 points
3 comments1 min readLW link
(aisafety.info)

One pager

samuelshadrach17 Mar 2025 8:12 UTC
6 points
2 comments8 min readLW link
(samuelshadrach.com)

The Case for AI Optimism

Annapurna17 Mar 2025 1:29 UTC
−6 points
1 comment1 min readLW link
(nationalaffairs.com)

Sys­tem­atic run­away-op­ti­miser-like LLM failure modes on Biolog­i­cally and Eco­nom­i­cally al­igned AI safety bench­marks for LLMs with sim­plified ob­ser­va­tion for­mat (BioBlue)

16 Mar 2025 23:23 UTC
45 points
8 comments13 min readLW link

Read More News

utilistrutil16 Mar 2025 21:31 UTC
25 points
2 comments5 min readLW link

What would a post la­bor econ­omy *ac­tu­ally* look like?

Ansh Juneja16 Mar 2025 20:38 UTC
3 points
2 comments17 min readLW link

Why White-Box Redteam­ing Makes Me Feel Weird

Zygi Straznickas16 Mar 2025 18:54 UTC
206 points
36 comments3 min readLW link

How I’ve run ma­jor projects

benkuhn16 Mar 2025 18:40 UTC
127 points
10 comments8 min readLW link
(www.benkuhn.net)

Count­ing Ob­jec­tions to Housing

jefftk16 Mar 2025 18:20 UTC
13 points
7 comments3 min readLW link
(www.jefftk.com)

I make sev­eral mil­lion dol­lars per year and have hun­dreds of thou­sands of fol­low­ers—what is the straight­est line path to uti­liz­ing these re­sources to re­duce ex­is­ten­tial-level AI threats?

shrimpy16 Mar 2025 16:52 UTC
161 points
26 comments1 min readLW link

Sibe­rian Arc­tic ori­gins of East Asian psy­chol­ogy

David Sun16 Mar 2025 16:52 UTC
6 points
0 comments1 min readLW link

AI Model His­tory is Be­ing Lost

Vale16 Mar 2025 12:38 UTC
19 points
1 comment1 min readLW link
(vale.rocks)

Me­tacog­ni­tion Broke My Nail-Bit­ing Habit

Rafka16 Mar 2025 12:36 UTC
45 points
20 comments2 min readLW link

[Question] Can we ever en­sure AI al­ign­ment if we can only test AI per­sonas?

Karl von Wendt16 Mar 2025 8:06 UTC
22 points
8 comments1 min readLW link

Can time prefer­ences make AI safe?

TerriLeaf15 Mar 2025 21:41 UTC
2 points
1 comment2 min readLW link

Help make the orca lan­guage ex­per­i­ment happen

Towards_Keeperhood15 Mar 2025 21:39 UTC
9 points
12 comments5 min readLW link