Has Diagram

TagLast edit: 29 Apr 2023 22:52 UTC by Gunnar_Zarncke

This tag is used to indicate that the post contains diagrams. This may be useful to quickly find such posts, or to exclude them in case you are visually impaired.

Using axis lines for good or evil

dynomight6 Mar 2024 14:47 UTC

151 points

39 comments4 min readLW link

(dynomight.net)

What are the results of more parental supervision and less outdoor play?

juliawise25 Nov 2023 12:52 UTC

228 points

31 comments5 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and Trent Hodgeson

13 Mar 2025 19:09 UTC

162 points

46 comments6 min readLW link

Shard Theory—is it true for humans?

Rishika14 Jun 2024 19:21 UTC

71 points

7 comments15 min readLW link

[Intro to brain-like-AGI safety] 7. From hardcoded drives to foresighted plans: A worked example

Steven Byrnes9 Mar 2022 14:28 UTC

80 points

3 comments10 min readLW link

A deep critique of AI 2027’s bad timeline models

titotal19 Jun 2025 13:29 UTC

366 points

39 comments39 min readLW link

(titotal.substack.com)

Drawing Less Wrong: Technical Skill

Raemon5 Dec 2011 5:12 UTC

37 points

39 comments9 min readLW link

[Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”

Steven Byrnes20 Apr 2022 12:58 UTC

45 points

10 comments15 min readLW link

How much do you believe your results?

Eric Neyman6 May 2023 20:31 UTC

513 points

18 comments15 min readLW link 4 reviews

(ericneyman.wordpress.com)

[Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts

Steven Byrnes27 Apr 2022 13:30 UTC

73 points

15 comments15 min readLW link

Induction heads—illustrated

CallumMcDougall2 Jan 2023 15:35 UTC

132 points

12 comments3 min readLW link

Hyperpolation

Gunnar_Zarncke15 Sep 2024 21:37 UTC

23 points

6 comments1 min readLW link

(arxiv.org)

Being the (Pareto) Best in the World

johnswentworth24 Jun 2019 18:36 UTC

491 points

61 comments3 min readLW link 3 reviews

Four Types of Disagreement

silentbob13 Apr 2025 11:22 UTC

50 points

4 comments5 min readLW link

Visual Exploration of Gradient Descent (many images)

silentbob17 Sep 2025 13:09 UTC

38 points

9 comments20 min readLW link

[Intro to brain-like-AGI safety] 4. The “short-term predictor”

Steven Byrnes16 Feb 2022 13:12 UTC

66 points

11 comments13 min readLW link

Towards a Less Bullshit Model of Semantics

johnswentworth and David Lorell

17 Jun 2024 15:51 UTC

94 points

44 comments21 min readLW link

Machine Learning Analogy for Meditation (illustrated)

abramdemski28 Jun 2018 22:51 UTC

100 points

48 comments1 min readLW link

Neural Categories

Eliezer Yudkowsky10 Feb 2008 0:33 UTC

64 points

17 comments4 min readLW link

Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide

Andrew_Critch24 Nov 2022 21:16 UTC

58 points

35 comments3 min readLW link 1 review

Demystifying “Alignment” through a Comic

milanrosko9 Jun 2024 8:24 UTC

108 points

19 comments1 min readLW link

An Illustrated Proof of the No Free Lunch Theorem

lifelonglearner8 Jun 2020 1:54 UTC

20 points

0 comments1 min readLW link

(mlu.red)

[Intro to brain-like-AGI safety] 8. Takeaways from neuro 1/2: On AGI development

Steven Byrnes16 Mar 2022 13:59 UTC

57 points

2 comments14 min readLW link

How good are LLMs at doing ML on an unknown dataset?

Håvard Tveit Ihle1 Jul 2024 9:04 UTC

33 points

4 comments13 min readLW link

The case for a negative alignment tax

Cameron Berg, Judd Rosenblatt, Diogo de Lucena and Trent Hodgeson

18 Sep 2024 18:33 UTC

77 points

20 comments7 min readLW link

An Introduction To The Mandelbrot Set That Doesn’t Mention Complex Numbers

Yitz17 Jan 2024 9:48 UTC

82 points

11 comments9 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and Trent Hodgeson

30 Jul 2024 16:22 UTC

226 points

51 comments12 min readLW link

The Natural Abstraction Hypothesis: Implications and Evidence

CallumMcDougall14 Dec 2021 23:14 UTC

40 points

9 comments19 min readLW link

Bayes’ Theorem Illustrated (My Way)

komponisto3 Jun 2010 4:40 UTC

173 points

195 comments9 min readLW link

[Valence series] 4. Valence & Social Status (deprecated)

Steven Byrnes15 Dec 2023 14:24 UTC

35 points

19 comments11 min readLW link

[Intro to brain-like-AGI safety] 14. Controlled AGI

Steven Byrnes11 May 2022 13:17 UTC

45 points

25 comments20 min readLW link

[Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL

Steven Byrnes2 Mar 2022 15:26 UTC

70 points

17 comments16 min readLW link

[Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain

Steven Byrnes2 Feb 2022 13:22 UTC

60 points

12 comments25 min readLW link

Testing The Natural Abstraction Hypothesis: Project Update

johnswentworth20 Sep 2021 3:44 UTC

88 points

17 comments8 min readLW link 1 review

[Intro to brain-like-AGI safety] 9. Takeaways from neuro 2/2: On AGI motivation

Steven Byrnes23 Mar 2022 12:48 UTC

46 points

11 comments22 min readLW link

[Intro to brain-like-AGI safety] 10. The alignment problem

Steven Byrnes30 Mar 2022 13:24 UTC

53 points

7 comments21 min readLW link

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Joe Carlsmith28 Oct 2024 21:57 UTC

54 points

5 comments32 min readLW link

[Intro to brain-like-AGI safety] 1. What’s the problem & Why work on it now?

Steven Byrnes26 Jan 2022 15:23 UTC

163 points

19 comments26 min readLW link

I turned decision theory problems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC

105 points

23 comments1 min readLW link

[Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning

Steven Byrnes23 Feb 2022 14:44 UTC

54 points

27 comments20 min readLW link

The lattice of partial updatelessness

Martín Soto10 Feb 2024 17:34 UTC

23 points

5 comments5 min readLW link

The Cartoon Guide to Löb’s Theorem

Eliezer Yudkowsky17 Aug 2008 20:35 UTC

45 points

105 comments1 min readLW link

Corrigibility, Much more detail than anyone wants to Read

Logan Zoellner7 May 2023 1:02 UTC

27 points

3 comments7 min readLW link

[Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering

Steven Byrnes9 Feb 2022 13:09 UTC

95 points

3 comments25 min readLW link

Residual stream norms grow exponentially over the forward pass

StefanHex and TurnTrout

7 May 2023 0:46 UTC

77 points

24 comments9 min readLW link

Visualizing small Attention-only Transformers

WCargo19 Nov 2024 9:37 UTC

4 points

0 comments8 min readLW link

A newcomer’s guide to the technical AI safety field

zeshen4 Nov 2022 14:29 UTC

42 points

3 comments10 min readLW link

Embedding safety in ML development

zeshen31 Oct 2022 12:27 UTC

24 points

1 comment18 min readLW link

Levels of goals and alignment

zeshen16 Sep 2022 16:44 UTC

27 points

4 comments6 min readLW link

Emrik 25 May 2024 16:21 UTC
1 point
0
It would be awesome if there was a way of actually browsing the diagrams directly, instead of opening and checking each post individually. Use-case: I’m trying to optimize my information-diet, and I often find visualizations way more usefwl per unit time compared to text. Alas, there’s no way to quickly search for eg “diagrams/graphs/figures related to X”.
(Originally I imagined it would be awesome if e.g. Elicit had a feature for previewing the figures associated with each paper returned by a search term, but I would love this for LW as well.)
Raemon 1 May 2023 20:23 UTC
2 points
2
Hmm. So, I think adding tags to posts is a bit of a cost (in that if there are more than a couple tags on a post, they blur together and become hard to read).
If people do actually find this tag useful, I think maybe the thing to do is make it hidden-by-default. (Maybe have a type of tag that is hidden beneath a “show more” on the OP)