An Introduction to AI Sandbagging

Teun van der Weij, Felix Hofstätter and Francis Rhys Ward

26 Apr 2024 13:40 UTC

41 points

1 comment8 min readLW link

My hour of memoryless lucidity

Eric Neyman4 May 2024 1:40 UTC

39 points

1 comment5 min readLW link

(ericneyman.wordpress.com)

KAN: Kolmogorov-Arnold Networks

Gunnar_Zarncke1 May 2024 16:50 UTC

10 points

10 comments1 min readLW link

(arxiv.org)

Why I’m doing PauseAI

Joseph Miller30 Apr 2024 16:21 UTC

92 points

10 comments4 min readLW link

“AI Safety for Fleshy Humans” an AI Safety explainer by Nicky Case

habryka3 May 2024 18:10 UTC

45 points

7 comments4 min readLW link

(aisafety.dance)

Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai16 Apr 2024 21:16 UTC

349 points

79 comments12 min readLW link

If you weren’t such an idiot...

kave and Mark Xu

2 Mar 2024 0:01 UTC

119 points

60 comments2 min readLW link

(markxu.com)

[Question] Which skincare products are evidence-based?

Vanessa Kosoy2 May 2024 15:22 UTC

81 points

24 comments1 min readLW link

LLM+Planners hybridisation for friendly AGI

installgentoo3 May 2024 8:40 UTC

6 points

2 comments1 min readLW link

[Question] Were there any ancient rationalists?

OliverHayman3 May 2024 18:26 UTC

11 points

3 comments1 min readLW link

Why is AGI/ASI Inevitable?

DeathlessAmaranth2 May 2024 18:27 UTC

14 points

6 comments1 min readLW link

A list of core AI safety problems and how I hope to solve them

davidad26 Aug 2023 15:12 UTC

161 points

26 comments5 min readLW link

Please stop publishing ideas/insights/research about AI

Tamsin Leake2 May 2024 14:54 UTC

22 points

48 comments4 min readLW link

Disentangling Competence and Intelligence

Robert Kralisch29 Apr 2024 0:12 UTC

23 points

7 comments6 min readLW link

Key takeaways from our EA and alignment research surveys

Cameron Berg, Judd Rosenblatt, florin_pop and AE Studio

3 May 2024 18:10 UTC

69 points

2 comments21 min readLW link

An Unintentional Compliment

abstractapplic and lsusr

28 Apr 2024 20:04 UTC

23 points

2 comments4 min readLW link

An explanation of evil in an organized world

KatjaGrace2 May 2024 5:20 UTC

26 points

9 comments2 min readLW link

(worldspiritsockpuppet.com)

Coherence of Caches and Agents

johnswentworth1 Apr 2024 23:04 UTC

73 points

7 comments11 min readLW link

[Question] Can stealth aircraft be detected optically?

Yair Halberstadt2 May 2024 7:47 UTC

18 points

24 comments1 min readLW link

Ironing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC

140 points

33 comments11 min readLW link