Ironing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC

87 points

5 comments11 min readLW link

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers

hugofry29 Apr 2024 20:57 UTC

20 points

3 comments11 min readLW link

Towards a formalization of the agent structure problem

Alex_Altair29 Apr 2024 20:28 UTC

18 points

0 comments14 min readLW link

Refusal in LLMs is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib111, wesg and Neel Nanda

27 Apr 2024 11:13 UTC

142 points

52 comments10 min readLW link

On Not Pulling The Ladder Up Behind You

Screwtape26 Apr 2024 21:58 UTC

118 points

9 comments9 min readLW link

Big-endian is better than little-endian

Menotim29 Apr 2024 2:30 UTC

26 points

13 comments3 min readLW link

Open-Source AI: A Regulatory Review

Elliot_Mckernon and Deric Cheng

29 Apr 2024 10:10 UTC

12 points

0 comments8 min readLW link

List your AI X-Risk cruxes!

Aryeh Englander28 Apr 2024 18:26 UTC

31 points

4 comments2 min readLW link

Constructability: Plainly-coded AGIs may be feasible in the near future

Épiphanie Gédéon and Charbel-Raphaël

27 Apr 2024 16:04 UTC

63 points

12 comments13 min readLW link

[Aspiration-based designs] 1. Informal introduction

B Jacobs, Jobst Heitzig, Simon Fischer and Simon Dima

28 Apr 2024 13:00 UTC

34 points

4 comments8 min readLW link

Unintentionally Creating Value

abstractapplic and lsusr

28 Apr 2024 20:05 UTC

22 points

0 comments2 min readLW link

[Question] Examples of Highly Counterfactual Discoveries?

johnswentworth23 Apr 2024 22:19 UTC

172 points

88 comments1 min readLW link

An Unintentional Compliment

abstractapplic and lsusr

28 Apr 2024 20:04 UTC

21 points

1 comment4 min readLW link

Disentangling Competence and Intelligence

Robert Kralisch29 Apr 2024 0:12 UTC

16 points

4 comments6 min readLW link

So What’s Up With PUFAs Chemically?

J Bostock27 Apr 2024 13:32 UTC

55 points

22 comments6 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC

266 points

94 comments17 min readLW link

(dynomight.net)

The first future and the best future

KatjaGrace25 Apr 2024 6:40 UTC

104 points

10 comments1 min readLW link

(worldspiritsockpuppet.com)

Duct Tape security

Isaac King26 Apr 2024 18:57 UTC

62 points

8 comments5 min readLW link

Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai16 Apr 2024 21:16 UTC

313 points

64 comments12 min readLW link

Superposition is not “just” neuron polysemanticity

LawrenceC26 Apr 2024 23:22 UTC

50 points

3 comments13 min readLW link