C’mon guys, De­liber­ate Prac­tice is Real

RaemonFeb 5, 2025, 10:33 PM
99 points
25 comments9 min readLW link

The Risk of Grad­ual Disem­pow­er­ment from AI

ZviFeb 5, 2025, 10:10 PM
87 points
20 comments20 min readLW link
(thezvi.wordpress.com)

Wired on: “DOGE per­son­nel with ad­min ac­cess to Fed­eral Pay­ment Sys­tem”

RaemonFeb 5, 2025, 9:32 PM
88 points
45 comments2 min readLW link
(web.archive.org)

On AI Scaling

harsimonyFeb 5, 2025, 8:24 PM
6 points
3 comments8 min readLW link
(splittinginfinity.substack.com)

The State of Metaculus

ChristianWilliamsFeb 5, 2025, 7:17 PM
21 points
0 commentsLW link
(www.metaculus.com)

Post-hoc rea­son­ing in chain of thought

Kyle CoxFeb 5, 2025, 6:58 PM
17 points
0 comments11 min readLW link

Deep­Seek-R1 for Beginners

Anton RazzhigaevFeb 5, 2025, 6:58 PM
12 points
0 comments8 min readLW link

Mak­ing the case for av­er­age-case AI Control

Nathaniel MitraniFeb 5, 2025, 6:56 PM
4 points
0 comments5 min readLW link

[Question] Align­ment Para­dox and a Re­quest for Harsh Criticism

Bridgett KayFeb 5, 2025, 6:17 PM
6 points
7 comments1 min readLW link

In­tro­duc­ing In­ter­na­tional AI Gover­nance Alli­ance (IAIGA)

jamesnorrisFeb 5, 2025, 4:02 PM
1 point
0 comments1 min readLW link

In­tro­duc­ing Col­lec­tive Ac­tion for Ex­is­ten­tial Safety: 80+ ac­tions in­di­vi­d­u­als, or­ga­ni­za­tions, and na­tions can take to im­prove our ex­is­ten­tial safety

jamesnorrisFeb 5, 2025, 4:02 PM
−9 points
2 comments1 min readLW link

Lan­guage Models Use Tri­gonom­e­try to Do Addition

Subhash KantamneniFeb 5, 2025, 1:50 PM
76 points
1 comment10 min readLW link

De­ploy­ing the Ob­server will save hu­man­ity from ex­is­ten­tial threats

Aram PanasencoFeb 5, 2025, 10:39 AM
−11 points
8 comments1 min readLW link

The Do­main of Orthogonality

mgfcatherallFeb 5, 2025, 8:14 AM
1 point
0 comments7 min readLW link

Re­view­ing LessWrong: Screw­tape’s Ba­sic Answer

ScrewtapeFeb 5, 2025, 4:30 AM
96 points
18 comments6 min readLW link

[Question] Why isn’t AI con­tain­ment the pri­mary AI safety strat­egy?

OKlogicFeb 5, 2025, 3:54 AM
1 point
3 comments3 min readLW link

Nick Land: Orthogonality

lumpenspaceFeb 4, 2025, 9:07 PM
12 points
37 comments8 min readLW link

What work­ing on AI safety taught me about B2B SaaS sales

purple fireFeb 4, 2025, 8:50 PM
7 points
12 comments5 min readLW link

Sub­jec­tive Nat­u­ral­ism in De­ci­sion The­ory: Sav­age vs. Jeffrey–Bolker

Feb 4, 2025, 8:34 PM
45 points
22 comments5 min readLW link

Anti-Slop In­ter­ven­tions?

abramdemskiFeb 4, 2025, 7:50 PM
76 points
33 comments6 min readLW link

Can Per­sua­sion Break AI Safety? Ex­plor­ing the In­ter­play Between Fine-Tun­ing, At­tacks, and Guardrails

Devina JainFeb 4, 2025, 7:10 PM
3 points
0 comments10 min readLW link

[Question] Jour­nal­ism stu­dent look­ing for sources

pinkertonFeb 4, 2025, 6:58 PM
11 points
3 comments1 min readLW link

We’re in Deep Research

ZviFeb 4, 2025, 5:20 PM
45 points
2 comments20 min readLW link
(thezvi.wordpress.com)

The Cap­i­tal­ist Agent

henophiliaFeb 4, 2025, 3:32 PM
1 point
10 comments3 min readLW link
(blog.hermesloom.org)

Fore­cast­ing AGI: In­sights from Pre­dic­tion Mar­kets and Metaculus

Alvin ÅnestrandFeb 4, 2025, 1:03 PM
13 points
0 comments4 min readLW link
(forecastingaifutures.substack.com)

Rul­ing Out Lookup Tables

Alfred HarwoodFeb 4, 2025, 10:39 AM
22 points
11 comments7 min readLW link

Half-baked idea: a straight­for­ward method for learn­ing en­vi­ron­men­tal goals?

Q HomeFeb 4, 2025, 6:56 AM
16 points
7 comments5 min readLW link

In­for­ma­tion Ver­sus Action

ScrewtapeFeb 4, 2025, 5:13 AM
27 points
0 comments6 min readLW link

Utili­tar­ian AI Align­ment: Build­ing a Mo­ral As­sis­tant with the Con­sti­tu­tional AI Method

Clément LFeb 4, 2025, 4:15 AM
6 points
1 comment13 min readLW link

Tear Down the Burren

jefftkFeb 4, 2025, 3:40 AM
45 points
2 comments2 min readLW link
(www.jefftk.com)

Con­sti­tu­tional Clas­sifiers: Defend­ing against uni­ver­sal jailbreaks (An­thropic Blog)

ArchimedesFeb 4, 2025, 2:55 AM
16 points
1 comment1 min readLW link
(www.anthropic.com)

Can some­one, any­one, make su­per­in­tel­li­gence a more con­crete con­cept?

Ori NagelFeb 4, 2025, 2:18 AM
2 points
8 comments5 min readLW link

What are the “no free lunch” the­o­rems?

Feb 4, 2025, 2:02 AM
19 points
4 comments1 min readLW link
(aisafety.info)

elimi­nat­ing bias through lan­guage?

KvmanThinkingFeb 4, 2025, 1:52 AM
1 point
12 comments1 min readLW link

New Fore­sight Longevity Bio & Molec­u­lar Nano Grants Program

Allison DuettmannFeb 4, 2025, 12:28 AM
11 points
0 comments1 min readLW link

Meta: Fron­tier AI Framework

Zach Stein-PerlmanFeb 3, 2025, 10:00 PM
33 points
2 comments1 min readLW link
(ai.meta.com)

$300 Fermi Model Competition

ozziegooenFeb 3, 2025, 7:47 PM
16 points
18 commentsLW link

Vi­su­al­iz­ing Interpretability

Darold DavisFeb 3, 2025, 7:36 PM
2 points
0 comments4 min readLW link

Align­ment Can Re­duce Perfor­mance on Sim­ple Eth­i­cal Questions

Daan HenselmansFeb 3, 2025, 7:35 PM
16 points
7 comments6 min readLW link

The Over­lap Paradigm: Re­think­ing Data’s Role in Weak-to-Strong Gen­er­al­iza­tion (W2SG)

Serhii ZamriiFeb 3, 2025, 7:31 PM
2 points
0 comments11 min readLW link

Sleeper agents ap­pear re­silient to ac­ti­va­tion steering

Lucy WingardFeb 3, 2025, 7:31 PM
6 points
0 comments7 min readLW link

Part 1: En­hanc­ing In­ner Align­ment in CLIP Vi­sion Trans­form­ers: Miti­gat­ing Reifi­ca­tion Bias with SAEs and Grad ECLIP

Gilber A. CorralesFeb 3, 2025, 7:30 PM
1 point
0 comments13 min readLW link

Su­per­in­tel­li­gence Align­ment Proposal

Davey MorseFeb 3, 2025, 6:47 PM
5 points
3 comments9 min readLW link

Get­tier Cases [re­post]

AntigoneFeb 3, 2025, 6:12 PM
−4 points
5 comments2 min readLW link

The Self-Refer­ence Trap in Mathematics

Alister MundayFeb 3, 2025, 4:12 PM
−41 points
23 comments2 min readLW link

Stop­ping un­al­igned LLMs is easy!

Yair HalberstadtFeb 3, 2025, 3:38 PM
−3 points
11 comments2 min readLW link

The Outer Levels

JerdleFeb 3, 2025, 2:30 PM
2 points
3 comments6 min readLW link

o3-mini Early Days

ZviFeb 3, 2025, 2:20 PM
45 points
0 comments15 min readLW link
(thezvi.wordpress.com)

OpenAI re­leases deep re­search agent

Seth Herd3 Feb 2025 12:48 UTC
78 points
21 comments3 min readLW link
(openai.com)

Neu­ron Ac­ti­va­tions to CLIP Embed­dings: Geom­e­try of Lin­ear Com­bi­na­tions in La­tent Space

Roman Malov3 Feb 2025 10:30 UTC
4 points
0 comments2 min readLW link