Kaarel

Karma: 1,750

kaarelh AT gmail DOT com

personal website

Honorable AI

Kaarel24 Dec 2025 21:20 UTC

42 points

23 comments41 min readLW link

An Advent of Thought

Kaarel17 Mar 2025 14:21 UTC

65 points

13 comments48 min readLW link

Deep Learning is cheap Solomonoff induction?

Lucius Bushnaq, Kaarel and Dmitry Vaintrob

7 Dec 2024 11:00 UTC

46 points

1 comment17 min readLW link

Finding the estimate of the value of a state in RL agents

Clément Dumas, Walter Laurito , KlaRo and Kaarel

3 Jun 2024 20:26 UTC

8 points

4 comments4 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

20 May 2024 17:55 UTC

24 points

7 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

20 May 2024 17:53 UTC

108 points

4 comments3 min readLW link

A starting point for making sense of task structure (in machine learning)

Kaarel, RP and jake_mendel

24 Feb 2024 1:51 UTC

51 points

2 comments12 min readLW link

Toward A Mathematical Framework for Computation in Superposition

Dmitry Vaintrob, jake_mendel and Kaarel

18 Jan 2024 21:06 UTC

214 points

19 comments63 min readLW link

Grokking, memorization, and generalization — a discussion

Kaarel and Dmitry Vaintrob

29 Oct 2023 23:17 UTC

75 points

11 comments23 min readLW link

Crystal Healing — or the Origins of Expected Utility Maximizers

Alexander Gietelink Oldenziel, RP and Kaarel

25 Jun 2023 3:18 UTC

52 points

12 comments5 min readLW link

Searching for a model’s concepts by their shape – a theoretical framework

Kaarel, Georgios Kaklamanos, Walter Laurito , Kay Kozaronek, AlexMennen and June Ku

23 Feb 2023 20:14 UTC

51 points

0 comments19 min readLW link

[RFC] Possible ways to expand on “Discovering Latent Knowledge in Language Models Without Supervision”.

Georgios Kaklamanos, Walter Laurito , Kaarel and Kay Kozaronek

25 Jan 2023 19:03 UTC

48 points

6 comments12 min readLW link

A gentle primer on caring, including in strange senses, with applications

Kaarel30 Aug 2022 8:05 UTC

10 points

4 comments18 min readLW link

kh’s Shortform

Kaarel6 Jul 2022 21:48 UTC

2 points

34 comments1 min readLW link

[Question] Transferring credence without transferring evidence?

Kaarel4 Feb 2022 8:11 UTC

11 points

6 comments3 min readLW link