All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28 29 30

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset

aphyer17 Jun 2024 21:29 UTC

61 points

11 comments6 min readLW link

Questionable Narratives of “Situational Awareness”

fergusq17 Jun 2024 21:01 UTC

0 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

ZuVillage Georgia – Mission Statement

Burns17 Jun 2024 19:53 UTC

4 points

3 comments9 min readLW link

Getting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblatt17 Jun 2024 18:44 UTC

267 points

50 comments13 min readLW link

Sycophancy to subterfuge: Investigating reward tampering in large language models

Carson Denison and evhub

17 Jun 2024 18:41 UTC

163 points

22 comments8 min readLW link

(arxiv.org)

Labor Participation is a High-Priority AI Alignment Risk

alex17 Jun 2024 18:09 UTC

7 points

0 comments17 min readLW link

Towards a Less Bullshit Model of Semantics

johnswentworth and David Lorell

17 Jun 2024 15:51 UTC

104 points

54 comments21 min readLW link 1 review

Analysing Adversarial Attacks with Linear Probing

Yoann Poupart, Imene Kerboua, Clement Neo and Jason Hoelscher-Obermaier

17 Jun 2024 14:16 UTC

15 points

0 comments8 min readLW link

What’s the future of AI hardware?

Itay Dreyfus17 Jun 2024 13:05 UTC

2 points

0 comments8 min readLW link

(productidentity.co)

OpenAI #8: The Right to Warn

Zvi17 Jun 2024 12:00 UTC

97 points

8 comments34 min readLW link

(thezvi.wordpress.com)

Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability

ntt12317 Jun 2024 11:46 UTC

5 points

4 comments6 min readLW link

(neuralblog.github.io)

Weak AGIs Kill Us First

yrimon17 Jun 2024 11:13 UTC

15 points

4 comments9 min readLW link

[Linkpost] Guardian article covering Lightcone Infrastructure, Manifest and CFAR ties to FTX

ROM17 Jun 2024 10:05 UTC

8 points

9 comments1 min readLW link

(www.theguardian.com)

Fat Tails Discourage Compromise

niplav17 Jun 2024 9:39 UTC

53 points

5 comments1 min readLW link

Our Intuitions About The Criminal Justice System Are Screwed Up

Bentham's Bulldog17 Jun 2024 6:22 UTC

12 points

15 comments4 min readLW link

A Case for Cooperation: Dependence in the Prisoner’s Dilemma

grantstenger17 Jun 2024 1:10 UTC

9 points

3 comments23 min readLW link

Degeneracies are sticky for SGD

Guillaume Corlouer and Nicolas Macé

16 Jun 2024 21:19 UTC

56 points

1 comment16 min readLW link

YM’s Shortform

YM16 Jun 2024 20:57 UTC

3 points

1 comment1 min readLW link

“Is-Ought” is Fraught

MiSteR Kittty16 Jun 2024 17:27 UTC

−5 points

2 comments1 min readLW link

The type of AI humanity has chosen to create so far is unsafe, for soft social reasons and not technical ones.

l8c16 Jun 2024 13:31 UTC

−11 points

2 comments1 min readLW link

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Henry Cai16 Jun 2024 13:01 UTC

7 points

0 comments7 min readLW link

(arxiv.org)

CIV: a story

Richard_Ngo15 Jun 2024 22:36 UTC

99 points

6 comments9 min readLW link

(www.narrativeark.xyz)

Yann LeCun: We only design machines that minimize costs [therefore they are safe]

tailcalled15 Jun 2024 17:25 UTC

19 points

8 comments1 min readLW link

(twitter.com)

(Appetitive, Consummatory) ≈ (RL, reflex)

Steven Byrnes15 Jun 2024 15:57 UTC

38 points

1 comment3 min readLW link

Two LessWrong speed friending experiments

mikko and sanyer

15 Jun 2024 10:52 UTC

53 points

3 comments4 min readLW link

Claude’s dark spiritual AI futurism

jessicata15 Jun 2024 0:57 UTC

23 points

7 comments43 min readLW link

(unstableontology.com)

[Question] When is “unfalsifiable implies false” incorrect?

VojtaKovarik15 Jun 2024 0:28 UTC

3 points

11 comments1 min readLW link

MIRI’s June 2024 Newsletter

Harlan14 Jun 2024 23:02 UTC

74 points

20 comments2 min readLW link

(intelligence.org)

Language for Goal Misgeneralization: Some Formalisms from my MSc Thesis

Giulio14 Jun 2024 19:35 UTC

10 points

0 comments8 min readLW link

(www.giuliostarace.com)

Shard Theory—is it true for humans?

r_b14 Jun 2024 19:21 UTC

71 points

7 comments15 min readLW link

When fine-tuning fails to elicit GPT-3.5′s chess abilities

Theodore Chapman14 Jun 2024 18:50 UTC

42 points

3 comments9 min readLW link

Results from the AI x Democracy Research Sprint

Esben Kran, jordinne and Jason Hoelscher-Obermaier

14 Jun 2024 16:40 UTC

13 points

0 comments6 min readLW link

Rational Animations’ intro to mechanistic interpretability

Writer14 Jun 2024 16:10 UTC

45 points

1 comment11 min readLW link

(youtu.be)

Why keep a diary, and why wish for large language models

DanielFilan14 Jun 2024 16:10 UTC

9 points

1 comment2 min readLW link

(danielfilan.com)

The Leopold Model: Analysis and Reactions

Zvi14 Jun 2024 15:10 UTC

109 points

19 comments57 min readLW link

(thezvi.wordpress.com)

[Question] Thoughts on Francois Chollet’s belief that LLMs are far away from AGI?

O O14 Jun 2024 6:32 UTC

26 points

17 comments1 min readLW link

Research Report: Alternative sparsity methods for sparse autoencoders with OthelloGPT.

Andrew Quaisley14 Jun 2024 0:57 UTC

17 points

5 comments12 min readLW link

Slowed ASI—a possible technical strategy for alignment

Lester Leong14 Jun 2024 0:57 UTC

7 points

2 comments3 min readLW link

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Andrew_Critch14 Jun 2024 0:16 UTC

369 points

41 comments4 min readLW link 3 reviews

OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors

Joel Burget13 Jun 2024 21:28 UTC

35 points

10 comments1 min readLW link

(openai.com)

AI #68: Remarkably Reasonable Reactions

Zvi13 Jun 2024 16:30 UTC

46 points

11 comments50 min readLW link

(thezvi.wordpress.com)

Four Futures For Cognitive Labor

Maxwell Tabarrok13 Jun 2024 12:56 UTC

14 points

11 comments4 min readLW link

(www.maximum-progress.com)

Underrated Proverbs

Arjun Panickssery13 Jun 2024 12:30 UTC

13 points

9 comments1 min readLW link

(arjunpanickssery.substack.com)

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der Weij, Felix Hofstätter, Ollie J, Sam F. Brown and Francis Rhys Ward

13 Jun 2024 10:04 UTC

84 points

10 comments2 min readLW link

(arxiv.org)

AiPhone

Zvi12 Jun 2024 22:20 UTC

63 points

4 comments14 min readLW link

(thezvi.wordpress.com)

microwave drilling is impractical

bhauth12 Jun 2024 22:16 UTC

60 points

19 comments4 min readLW link

(www.bhauth.com)

Phonosemantic Duplication

bitcoinssg12 Jun 2024 20:19 UTC

5 points

0 comments1 min readLW link

My AI Model Delta Compared To Christiano

johnswentworth12 Jun 2024 18:19 UTC

196 points

75 comments4 min readLW link 1 review

AI: 4 levels of impact [micropost]

Mati_Roy12 Jun 2024 16:58 UTC

8 points

0 comments1 min readLW link

Aggregative principles approximate utilitarian principles

Cleo Nardo12 Jun 2024 16:27 UTC

28 points

3 comments23 min readLW link