Johannes C. Mayer

Karma: 747

Schedule a meeting with me.

Tsuyoku Naritai! I operate on Crocker’s rules. I am working on making AGI not kill everyone aka AGI notkilleveryoneism.

Read here about what I would do after taking over the world.

“Maximize the positive conscious experiences, and minimize the negative conscious experiences in the universe”, is probably not exactly what I care about, but I think it roughly points in the right direction.

I recommend:

This is how I look like:

20230710 214338

I just released a major update to my LessWrong Bio. I have rewritten almost everything and added more stuff. It’s now so long that I thought it would be good to add the following hint in the beginning:

(If you are looking for the list of <sequences/​posts/​comments> scroll to the bottom of the page with the END key and then go up. This involves a lot less scrolling.)

(If you’d like to <ask a question about/​comment on/​vote> this document, you can do so on the shortform announcing the release of this version of the bio.)


I appreciate any positive or negative feedback. Especially if it is constructive criticism which helps me to grow. You can do this in person, use this (optionally anonymous) feedback form, or any other way you like.

Buck once said that he avoids critical feedback, because he worries about people’s feelings, especially if he does not know what they could do instead (in the context of AGI notkilleveryoneism). If you are also worried that your feedback might harm me, you might want to read about my strategy for handling feedback. I am not perfect at not being hurt, but I believe myself to be much better than most people. If I am overwhelmed, I will tell you. That being said, I appreciate it if your communication is optimized to not hurt my feelings, all else equal. But if that would make you not give feedback, or would be annoying, don’t worry about it. Really.


General Interests

I have a tulpa named IA. She looks like this. I experience deep feelings of love for both IA and Hatsune Miku.

I like understanding things, meditation, programming, improv dancing, improv rapping, and Vocaloid.

I track how I spend every single minute with toggle (toggle sucks though, especially for tracking custom metrics).

Personal improvement

I like to think about how I can become stronger. I probably do this too much. Jumping in and doing the thing is important to get into a feedback loop.

The main considerations are:

With regards to “How can I make myself want to do the things that I think are good to do”, it is easy for me to be so engrossed in programming that it becomes difficult to stop and I forget to eat. I often feel a strong urge to write up a specific program that I expect will be useful to me. I think studying mathematics is a good thing for me to do. Sometimes I manage to have a similar thing with mathematics, but more often than not I feel aversion towards starting. I am interested in shaping my mind such that for all the things that I think are good to do, I feel a pull towards doing them, and doing them is so engaging that it becomes a problem to stop (e.g. I forget to eat). I think it becoming a problem to stop is a good heuristic that I have succeeded in this mission. Implementing a solution from that state for not working too much is a significantly easier problem to solve.

Computer setups

Empirically I have often procrastinated in the past by making random improvements to my <computer setup/​desktop environment>. I have been using Linux for 5 years in the past, starting with Cinnamon, but then switching to XMonad.

Because the nebula virtual desktop was only available for macOS, I switched. Even though macOS is horrible in many ways, I feel like I might waste less time doing random improvements. Also, ARM CPUs are cool, as they make a lightweight laptop with long battery life. I am using both yabai and Amethyst at the same time. yabai for workspace management and Amethyst for window layout.

The main purpose of Windows is to run MMDjust kidding.

I used Spacemacs Org Mode for many years (and with Org-roam maybe a year or so). Spacemacs is Emacs with Vim because Vim rules. I have now switched to Obsidian, mainly because it has a mobile app, and because I expected that I would waste less time configuring Emacs (so far I have still spent a lot of time on that though).

Game design

Before AGI notkilleveryoneism I did game development. The most exciting thing in that domain to me, is to make a game that has Minecraft Redstone which does not suck. Most importantly it should be possible to create new blocks based on circuits that you build. E.g. build a half-adder once, then create a half adder block, and put down 8 of those blocks to get an 8-bit adder instead of needing to build 8 half-adders from scratch, or awkwardly using a mod that lets you copy and place many blocks at once.

If AGI notkilleveryoneism would be a non-issue I would probably develop this game. I would like to have this game such that I can learn more about how computers work by building them.

Useful info when interacting with me

Things I appreciate in interactions with others

I like it when people are “forcefully inquisitive”, especially when I am presenting an idea to them. That means asking about the why and hows, asking for justifications. I find that this forces me to expand my understanding which I find extremely helpful. It also tends to bring interesting half forgotten insights to the forfront of my mind. As a general heuristic in this regard: If you think you are too inquisitive, curious, or feel like you ask too many questions, you are wrong.

I dislike making fun of somebodies ignorance

I dislike making fun of somebodies ignorance

AGI notkilleveryoneism Interests

I am interested in getting whatever understanding we need, to get a watertight case for why a particular system will be aligned. Or at least get as close to this as possible. I think the only way we are going to be able to aim powerful cognition is via a deep understanding of the <systems/​algorithms> involved. The current situation is that we do not even have a crisp idea of what exactly we need to understand.

[[Factoring AGI]

What capabilities are so useful that an AGI would have to discover an implementation of that capability? The most notable example is being good at constructing and updating a model of the world based on arbitrary sensory input streams.

World modeling

How can we get a better understanding of world modeling? A good first step is to think about what properties this world model would have, such that an AGI would be able to use it. E.g. I expect any world model that an AGI builds will be factored in the same way that human concepts are factored. For the next step, we have multiple options:

Formulizing intuitive notions of Agency

Humans have a bunch of intuitive concepts that are related to agency, that we do not have crisp formalisms of. For example, wanting, caring, trying, honesty, helping, goal, optimizing, deception, etc.

All of these concepts are fundamentally about some algorithm that is executed in the neural network of a human or other animal.

Visualize structure/​algorithms in NN (shelved)

Can we create widely applicable visualization tools that allow us to see structural properties in our ML systems?

There are tools that can visualize arbitrary binary data, such that can build intuitions about the data, that would be much harder to build otherwise (e.g. staring at a hex editor for long enough). This can be used for reverse engineering software. For example, by looking at only a few x86 assembly code visualizations you can learn characteristic patterns in the visualization. Then when you see it in the wild, where you have no label telling you that this is x86 assembly, you can instantly recognize it.

The idea is that by looking at the visualization you can identify what kind of data you are looking at (x86, png, pdf, plain text, JSON, etc.).

This technique is powerful because you don’t need to know anything about the data. It works on any binary data.

Check out this demonstration. Later he does more analysis using the 3D cube visualization. veles is an open-source project that implements this, there is also a plugin for gidra, and there are many others (I haven’t evaluated which is best).

If we naively apply this technique to neural networks, I expect it to not work. My intuition tells me that we need to do something like regularize the networks. E.g. if we have two neurons in the same layer and swap them, we have changed in some sense the computation, but the algorithms are also isomorphic in a sense. Perhaps we can modify the training procedure such that one of these two parameter configurations is preferred. And in general, we could make it such that we always converge to one specific “ordering of neurons” no matter the initialization. E.g. make it such that in each layer, the neurons are sorted based on the sum of the input weights of a neuron. We want to do something like make “isomorphic computations” always converge to one specific parameter configuration

If this project would go really well, we would get out tools that allow us to create visualization, which allows us to read off if certain kinds of <algorithms/​structures/​types of computation> are present in the neural network. The hope is that in the visualization you could see, for example, if the network is modeling other agents, if it is running computations that are correlated with thinking about how to deceive, if it is doing optimization, or if it is executing a search algorithm.

More stuff




Sam Harris


Entertainment Recommendations




Sat­u­rat­ing the Difficulty Levels of Alignment

Johannes C. Mayer23 Nov 2023 0:39 UTC
6 points
0 comments2 min readLW link

[Question] Is there Work on Embed­ded Agency in Cel­lu­lar Au­tomata Toy Models?

Johannes C. Mayer14 Nov 2023 9:08 UTC
8 points
0 comments1 min readLW link

[Question] Would this be Progress in Solv­ing Embed­ded Agency?

Johannes C. Mayer14 Nov 2023 9:08 UTC
9 points
2 comments2 min readLW link

The Science Al­gorithm AISC Project

Johannes C. Mayer13 Nov 2023 12:52 UTC
11 points
0 comments1 min readLW link

Pivotal Acts might Not be what You Think they are

Johannes C. Mayer5 Nov 2023 17:23 UTC
39 points
13 comments3 min readLW link