Robert_AIZI

Karma: 1,399

SAEs you can See: Applying Sparse Autoencoders to Clustering

Robert_AIZI28 Oct 2024 14:48 UTC

27 points

0 comments10 min readLW link

Comments on Anthropic’s Scaling Monosemanticity

Robert_AIZI3 Jun 2024 12:15 UTC

98 points

8 comments7 min readLW link

Explaining a Math Magic Trick

Robert_AIZI5 May 2024 19:41 UTC

103 points

10 comments5 min readLW link

Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT

Robert_AIZI5 Mar 2024 13:55 UTC

62 points

24 comments10 min readLW link

(aizi.substack.com)

Rating my AI Predictions

Robert_AIZI21 Dec 2023 14:07 UTC

22 points

5 comments2 min readLW link

(aizi.substack.com)

Comparing Anthropic’s Dictionary Learning to Ours

Robert_AIZI7 Oct 2023 23:30 UTC

137 points

8 comments4 min readLW link

Sparse Autoencoders Find Highly Interpretable Directions in Language Models

Logan Riggs, Hoagy, Aidan Ewart and Robert_AIZI

21 Sep 2023 15:30 UTC

161 points

8 comments5 min readLW link

Unsafe AI as Dynamical Systems

Robert_AIZI14 Jul 2023 15:31 UTC

11 points

0 comments3 min readLW link

(aizi.substack.com)

AIs teams will probably be more superintelligent than individual AIs

Robert_AIZI4 Jul 2023 14:06 UTC

3 points

1 comment2 min readLW link

(aizi.substack.com)

[Research Update] Sparse Autoencoder features are bimodal

Robert_AIZI22 Jun 2023 13:15 UTC

24 points

1 comment5 min readLW link

(aizi.substack.com)

Explaining “Taking features out of superposition with sparse autoencoders”

Robert_AIZI16 Jun 2023 13:59 UTC

10 points

0 comments8 min readLW link

(aizi.substack.com)

[Question] Question for Prediction Market people: where is the money supposed to come from?

Robert_AIZI8 Jun 2023 13:58 UTC

25 points

26 comments1 min readLW link

Is behavioral safety “solved” in non-adversarial conditions?

Robert_AIZI25 May 2023 17:56 UTC

26 points

8 comments2 min readLW link

(aizi.substack.com)

Research Report: Incorrectness Cascades (Corrected)

Robert_AIZI9 May 2023 21:54 UTC

9 points

0 comments9 min readLW link

(aizi.substack.com)

I was Wrong, Simulator Theory is Real

Robert_AIZI26 Apr 2023 17:45 UTC

75 points

7 comments3 min readLW link

(aizi.substack.com)

The Toxoplasma of AGI Doom and Capabilities?

Robert_AIZI24 Apr 2023 18:11 UTC

72 points

13 comments1 min readLW link

Study 1b: This One Weird Trick does NOT cause incorrectness cascades

Robert_AIZI20 Apr 2023 18:10 UTC

5 points

0 comments6 min readLW link

(aizi.substack.com)

Research Report: Incorrectness Cascades

Robert_AIZI14 Apr 2023 12:49 UTC

19 points

0 comments10 min readLW link

(aizi.substack.com)

Pre-registering a study

Robert_AIZI7 Apr 2023 15:46 UTC

10 points

0 comments6 min readLW link

(aizi.substack.com)

Invocations: The Other Capabilities Overhang?

Robert_AIZI4 Apr 2023 13:38 UTC

29 points

4 comments4 min readLW link

(aizi.substack.com)