Claude is a Ravenclaw

Adam Newgas4 Jul 2025 21:32 UTC
63 points
9 comments2 min readLW link
(www.boristhebrave.com)

To All Lan­guage Models Read­ing This

Ivan Vendrov4 Jul 2025 17:56 UTC
11 points
0 comments4 min readLW link
(nothinghuman.substack.com)

Mar­ket Pulse Challenge 25Q3, $7,500 Prize Pool

ChristianWilliams4 Jul 2025 17:03 UTC
4 points
0 comments1 min readLW link

How much novel se­cu­rity-crit­i­cal in­fras­truc­ture do you need dur­ing the sin­gu­lar­ity?

Buck4 Jul 2025 16:54 UTC
56 points
7 comments5 min readLW link

Early Signs of Stegano­graphic Ca­pa­bil­ities in Fron­tier LLMs

4 Jul 2025 16:36 UTC
30 points
5 comments2 min readLW link

Dear Paper­clip Max­i­mizer, Please Don’t Turn Off the Simulation

4 Jul 2025 16:13 UTC
6 points
6 comments4 min readLW link

Two pro­posed pro­jects on ab­stract analo­gies for scheming

Julian Stastny4 Jul 2025 16:03 UTC
48 points
0 comments3 min readLW link

Mouse caviar: mass-pro­duc­tion of eggs

Metacelsus4 Jul 2025 15:44 UTC
17 points
0 comments3 min readLW link
(denovo.substack.com)

‘AI for so­cietal up­lift’ as a path to victory

Raymond Douglas4 Jul 2025 15:32 UTC
85 points
22 comments2 min readLW link

The Self-Hat­ing At­ten­tion Head: A Deep Dive in GPT-2

Matteo Migliarini4 Jul 2025 13:07 UTC
12 points
0 comments7 min readLW link

How AI re­searchers define AI sen­tience? Par­ti­ci­pate in the poll

Valentin20264 Jul 2025 12:29 UTC
7 points
4 comments1 min readLW link

Hous­ing Roundup #12

Zvi4 Jul 2025 12:10 UTC
24 points
9 comments26 min readLW link
(thezvi.wordpress.com)

[Question] Can a pre-com­mit­ment to not give in to black­mail be “coun­tered” by a pre-com­mit­ment to ig­nore such pre-com­mit­ments?

Sappique4 Jul 2025 11:48 UTC
10 points
12 comments1 min readLW link

Out­live: A Crit­i­cal Review

MichaelDickens4 Jul 2025 2:14 UTC
64 points
4 comments27 min readLW link
(mdickens.me)

Lay­ered AI Defenses Have Holes: Vuln­er­a­bil­ities and Key Recommendations

4 Jul 2025 0:07 UTC
13 points
1 comment4 min readLW link
(far.ai)

MIRI Newslet­ter #123

3 Jul 2025 22:56 UTC
54 points
0 comments2 min readLW link
(intelligence.org)

Mak­ing Sense of Con­scious­ness Part 2: Attention

sarahconstantin3 Jul 2025 21:20 UTC
16 points
1 comment6 min readLW link
(sarahconstantin.substack.com)

Bat­tle of the Sexes—how to solve any (solv­able) dispute

James Stephen Brown3 Jul 2025 19:21 UTC
7 points
0 comments3 min readLW link
(nonzerosum.games)

How worker co-ops can help re­store so­cial trust

B Jacobs3 Jul 2025 19:13 UTC
12 points
7 comments6 min readLW link
(bobjacobs.substack.com)

The Ul­ti­ma­tum Game—take it or leave it

James Stephen Brown3 Jul 2025 19:05 UTC
11 points
1 comment2 min readLW link
(nonzerosum.games)

A com­ment on Bayesian vs. fre­quen­tist statis­ti­cal practice

fryolysis3 Jul 2025 17:47 UTC
0 points
0 comments1 min readLW link

AISN #58: Se­nate Re­moves State AI Reg­u­la­tion Moratorium

3 Jul 2025 17:26 UTC
6 points
0 comments4 min readLW link
(newsletter.safe.ai)

Con­test for Bet­ter AGI Safety Plans

peterr3 Jul 2025 17:02 UTC
29 points
1 comment8 min readLW link
(manifund.org)

Re­search Note: Our schem­ing pre­cur­sor evals had limited pre­dic­tive power for our in-con­text schem­ing evals

Marius Hobbhahn3 Jul 2025 15:57 UTC
75 points
0 comments1 min readLW link
(www.apolloresearch.ai)

AI #123: Mo­ra­to­rium Moratorium

Zvi3 Jul 2025 15:40 UTC
33 points
1 comment49 min readLW link
(thezvi.wordpress.com)

Call for sug­ges­tions—AI safety course

boazbarak3 Jul 2025 14:30 UTC
53 points
23 comments1 min readLW link

Why I am not a poly­genic score nihilist

David Hugh-Jones3 Jul 2025 13:38 UTC
6 points
0 comments2 min readLW link
(wyclif.substack.com)

Hunch: min­i­mal­ism is correct

Adam Zerner3 Jul 2025 5:03 UTC
18 points
12 comments2 min readLW link

If Any­one Builds It, Every­one Dies: Ad­ver­tise­ment de­sign competition

yams2 Jul 2025 23:14 UTC
85 points
37 comments1 min readLW link
(intelligence.org)

Dialects for Hu­mans: Sound­ing Distinct from LLMs

nebrelbug2 Jul 2025 23:03 UTC
9 points
2 comments2 min readLW link

Congress Asks Bet­ter Questions

Zvi2 Jul 2025 22:10 UTC
48 points
1 comment17 min readLW link
(thezvi.wordpress.com)

Eat­ing Honey is (Prob­a­bly) Fine, Actually

Linch2 Jul 2025 22:09 UTC
35 points
0 comments3 min readLW link
(linch.substack.com)

On Pay­ing Attention

Alex Darby2 Jul 2025 21:52 UTC
4 points
0 comments7 min readLW link

Cur­ing PMDD with Hair Loss Pills

David Lorell2 Jul 2025 21:35 UTC
102 points
3 comments8 min readLW link

[Question] RSS feed for 1 LW user?

Commander Zander2 Jul 2025 20:19 UTC
9 points
1 comment1 min readLW link

Thought An­chors: Which LLM Rea­son­ing Steps Mat­ter?

2 Jul 2025 20:16 UTC
35 points
6 comments6 min readLW link
(www.thought-anchors.com)

Cy­ber­punk Yoga

Commander Zander2 Jul 2025 20:04 UTC
6 points
0 comments3 min readLW link

The in­fluence con­jec­ture and its implcations

Bastian Gronager2 Jul 2025 19:36 UTC
−1 points
0 comments5 min readLW link

Idea on Bayes’ Theorem

BJ33832 Jul 2025 19:27 UTC
3 points
3 comments1 min readLW link

The Pri­soner’s Dilemma—A Prob­le­matic Poster-Child

James Stephen Brown2 Jul 2025 19:10 UTC
9 points
0 comments3 min readLW link

AI Task Length Hori­zons in Offen­sive Cybersecurity

Sean Peters2 Jul 2025 18:36 UTC
70 points
10 comments12 min readLW link

Slic­ing the (Kosher) Hate Salami

ymeskhout2 Jul 2025 18:11 UTC
21 points
4 comments11 min readLW link
(www.ymeskhout.com)

Race and Gen­der Bias As An Ex­am­ple of Un­faith­ful Chain of Thought in the Wild

2 Jul 2025 16:35 UTC
181 points
25 comments4 min readLW link

Ex­ec­u­tive Be­loc­racy: Re­view of Or­ga­ni­za­tion Types

belos2 Jul 2025 15:56 UTC
−1 points
0 comments11 min readLW link
(bestofagreatlot.substack.com)

There are two fun­da­men­tally differ­ent con­straints on schemers

Buck2 Jul 2025 15:51 UTC
62 points
0 comments4 min readLW link

Myth­bust­ing the sup­posed “1,000+ AI state bills that would hob­ble in­no­va­tion”

sjadler2 Jul 2025 4:49 UTC
6 points
0 comments1 min readLW link
(stevenadler.substack.com)

[Question] Are LLMs be­ing trained us­ing LessWrong text?

Cedar2 Jul 2025 3:00 UTC
7 points
4 comments1 min readLW link

“What’s my goal?”

Raemon2 Jul 2025 2:58 UTC
122 points
9 comments2 min readLW link

Use AI to Dimensionalize

Jordan Rubin2 Jul 2025 2:43 UTC
10 points
1 comment3 min readLW link
(jordanmrubin.substack.com)

Why En­gag­ing with Global Ma­jor­ity AI Policy Matters

Heramb2 Jul 2025 1:46 UTC
9 points
0 comments2 min readLW link