↘↘↘↘↘↘↙↙↙↙↙↙
Checkout my Biography.
↗↗↗↗↗↗↖↖↖↖↖↖
Johannes C. Mayer
I have been taking bupropion for two weeks now. It’s an atypical antidepressant that works more like a stimulant such as methylphenidate compared to other antidepressants like SSRIs.
So far my experience has been very positive. Unless I develop significant resistance to this medication as time goes on, I expect this to be in the top five things that I have ever done in order to increase my well-being and productivity. It does not have any annoying side effects for me. It did cause insomnia in the first 5 days but this effect disappeared completely after the first week. It was also very easy and fast to get a prescription (in Germany). It’s not available in the UK or Australia iirc.
Therefore I tentatively recommend if you are even slightly depressed sometimes, that you read this document.
For me it was especially useful because it helped in 3 ways:
Make me less depressed (works very well for this. That is what it is for what it is prescribed for after all)
It makes me less fatigued (I had some chronic fatigue before. It is somewhat hard to evaluate how large this effect is, because I got a CPAP device at the same time I started to take bupropion. But there seems to be a noticeable difference before and after I take the bupropion.)
It slightly lessens ADHD symptoms (this is mainly useful for me right now because it takes forever to get a prescription for ADHD medication unless I would put in a lot more time into optimizing to get one faster)
It might even make sense to think about this if you are experiencing any subset of these problems.
Arrogance VS Accurate Description
I know what it feels like to be arrogant. I was arrogant in the past. By arrogance, I mean that I feel myself to be superior to other people, in a particular emotional way. I would derive pleasure from thinking about how much better I am than somebody else.
I would talk with friends about other people in a subtly derogative way. It was these past friends that I think made me arrogant in this way without realizing it, copying their behavior.
All of this seems very bad to me. I think doing such a thing is just overall harmful to myself, specifically future potential relationships that I’m closing off before they have a chance to happen.
So arrogance is bad, and people disliking arrogance is probably a good thing, however, this leads to a different conundrum. Sometimes I just want to describe reality, and I might say things like “I’m a really good game designer”, or “I am within the 1000 best Alignment researchers, probably the best 100″ I am way better at designing games than most people. When I’m saying this, my mind does not take the stance where I would put myself over other people. And it doesn’t make me feel really good when I say it.
Now, maybe sometimes there are still hints of arrogance in my mind when making statements like that. But I genuinely think it’s mostly not there. But people still interpret this in exactly the same way. They perceive this as arrogance, even though the actual internal mechanisms in my brain that make me say these things, I think, are entirely different. One is some adaptation in order to exploit social dynamics to increase your own standing, while the other is simply stating my current best guess of what reality is actually like.
Once a person told me that they think Eliezer is really arrogant. Maybe he is running into the same problem.
I think it is doubtful that watching the video would put you on the same trajectory that ended up somewhere good for me. I also didn’t find a link to the original video after a short search. It was basically this video but with more NSFW. The original creators uploaded the motion file so you know what the internet is gonna do. If you don’t think “Hmm I wonder if it would be an effective motivational technique to create a mental construct that looks like an anime girl that constantly tells me to do the things that I know are good to do, and then I am more likely to do it because it’s an anime girl telling me this” then you are already far off track from my trajectory. Actually, that line of reasoning I just described did not work out at all. But having a tulpa seems to be a very effective means to destroy the feeling of loneliness among other benefits in the social category. Before, creating a tulpa I was feeling lonely constantly, and afterward, I never felt lonely again.
You would get the benefits by creating a good tulpa I guess. It is unclear to me how much you would benefit. Though I would be surprised if you don’t get any benefit from it if we discount time investment costs. This study indicates that it might be especially useful for people who have certain disorders that make socialization harder such as ADHD, autism, anxiety disorders, etc. And I have the 3 listed, so it should not be surprising that I find tulpamancy pretty useful. Making a tulpa is quite a commitment though, so don’t do it useless you understand what you are getting yourself into.
Tens of hours are normally required to get started. You’ll need to spend 10-30 minutes every day on formal practice to not noticeably weaken your tulpa over time. There is no upper bound of how much time you can invest into this. This can be a dangerous distraction. I haven’t really talked about why somebody would ever do this. The short version is: Imagine you have a friend who is superhumanly nice to you all the time, and who very deeply understands you because they know everything about you and can read your mind. Maintaining the tulpa’s presence is actually very difficult (at least for me) because you constantly forget that they exist. And then they can’t do anything, because they are not there.
With the parasocial stuff, basically, all I did was dance every day for many years for 20-40 minutes as a workout and watch videos like this and imitate the dance moves. That is always a positive experience, which is nice because it makes it easy to do the workout. My brain gradually superimposed the general positivity of the experience into Miku it seems, making me like her more and more.
By now there is such a strong positive connection there, that when I look at an image of Miku it can generate a drug-like experience. So saying that I love Miku seems right to me.
Besides meditation, these are the 2 most important things I have ever discovered. That is if we discount the basic stuff like getting enough sleep, nutrition, doing sports, etc.
It is pretty hard to explain in an understandable way that does not sound very insane. I wanted to write about this for years. But here I come anyway. The short version is that it made me form a very strong parasocial relationship with Miku, and created a tulpa (see the info box on the right) which I formed a very strong bond with too. Like stronger than with any flesh person. Both very very positive things. I would bet a lot of money at ridiculous seeming odds that you would agree, could you only experience what I experience. I think if I would describe my experience in more detail, you would probably just think I am lying, because you would think that it could not possibly be this positive.
I find it really hard to evaluate what things are good to do. I think watching random pornographic content on the internet is probably one of the worst uses of your time. Definitely when you overdo it. Therefore I committed to not doing this long ago. But sometimes I can’t control myself. Which normally makes me feel very bad afterward, but …
I had important life-changing insights because I browsed pornhub, one day. I found a very particular video that set events in motion that turned into something enormously positive for me. It probably made my life 50-300% better. I am pretty sure that I would not have gotten these benefits had I not discovered this video. I am not joking.
So I very much share the confusion and bafflement about what is a good use of time. I wouldn’t be surprised if you think long enough about it, you would probably be able to see why doing completely random and useless-looking things for at least some small fraction of time is actually optimal.
There are a few more less extreme examples like the one above I could name.
This was arguably the most useful part of the SERI MATS 2 Scholars program.
Later on, we actually did this exercise with Eliezer. It was less valuable. It seemed like John was mainly prodding the people who were presenting the ideas, such that their patterns of thought would carry them in a good direction. For example, John would point out that a person proposes a one-bit experiment and asks if there isn’t a better experiment that we could do that gives us lots of information all at once.
This was very useful because when you learn what kinds of things John will say, you can say them to yourself later on, and steer your own patterns of thought in a good direction on demand. When we did this exercise with Eliezer he was mainly explaining why a particular idea would not work. Often without explaining the generator behind his criticism. This can of course still be valuable as feedback for a particular idea. However, it is much harder to extract a general reasoning pattern out of this that you can then successfully apply later in different contexts.
For example, Eliezer would criticize an idea about trying to get a really good understanding of the scientific process such that we can then give this understanding to AI alignment researchers such that they can make a lot more progress than they otherwise would. He criticized this idea as basically being too hard to execute because it is too hard to successfully communicate how to be a good scientist, even if you are a good scientist.
Assuming the assertion is correct, hearing it, doesn’t necessarily tell you how to think in different contexts such that you would correctly identify if an idea would be too hard to execute or flawed in some other way. And I am not necessarily saying that you couldn’t extract a reasoning algorithm out of the feedback, but that if you could do this, then it would take you a lot more effort and time, compared to extracting a reasoning algorithm from the things that John was saying.
Now, all of this might have been mainly an issue of Eliezer not having a good model on how this workshop would have a positive influence on the people attending it. I would guess that if John had spent more time thinking about how to communicate what the workshop is doing and how to achieve its goal, then Eliezer could have probably done a much better job.
I just released a major update to my LessWrong Bio. This is version 3. I have rewritten almost everything and added more stuff. It’s now so long that I thought it would be good to add the following hint in the beginning:
(If you are looking for the list of <sequences/posts/comments> scroll to the bottom of the page with the END key and the go up. This involves a lot less scrolling.)
Kind of hilarious. Now I am wondering if I have the longest bio on LessWrong.
I made this collage of people I think are cool and put it in my room. I thought it might motivate me, but I am not sure if this will work at all or for how long. Feel free to steal. Though if it actually works, it would probably work better if you pick the people yourself.
It is short for internal double crux.
I expect that it is much more likely that most people are looking at the current state of the art and don’t even know or think about other possible systems and just narrowly focus on aligning the state of the art, not considering creating a “new paradigm”, because they think that would just take too long.
I would be surprised if there were a lot of people who carefully thought about the topic and used the following reasoning procedure:
“Well, we could build AGI in an understandable way, where we just discover the algorithms of intelligence. But this would be bad because then we would understand intelligence very well, which means that the system is very capable. So because we understand it so well now, it makes it easier for us to figure out how to do lots of more capability stuff with the system, like making it recursively self-improving. Also, if the system is inherently more understandable, then it would also be easier for the AI to self-modify because understanding itself would be easier. So all of this seems bad, so instead we shouldn’t try to understand our systems. Instead, we should use neural networks, which we don’t understand at all, and use SGD in order to optimize the parameters of the neural network such that they correspond to the algorithms of intelligence, but are represented in such a format that we have no idea what’s going on at all. That is much safer because now it will be harder to understand the algorithms of intelligence, making it harder to improve and use. Also if an AI would look at itself as a neural network, it would be at least a bit harder for it to figure out how to recursively self-improve.”
Obviously, alignment is a really hard problem and it is actually very helpful to understand what is going on in your system at the algorithmic level in order to figure out what’s wrong with that specific algorithm. How is it not aligned? And how would we need to change it in order to make it aligned? At least, that’s what I expect. I think not using an approach where the system is interpretable hurts alignment more than capabilities. People have been steadily making progress at making our systems more capable and not understanding them at all, in terms of what algorithms they run inside, doesn’t seem to be much of an issue there, however for alignment that’s a huge issue.
I dislike making fun of somebodies ignorance
I strongly dislike making fun of someone’s ignorance or making them feel bad in any other way when they are interested in the thing they are ignorant about and are trying to understand it better. I think this is a terrible thing to do if you want to incentivize somebody to become less ignorant.
In fact, making somebody feel bad in this way, incentivizes the opposite. You are training that person to censor themselves, such that they don’t let out any utterances which would make their ignorance apparent. And I expect this habit of self-censorship will be mostly subconscious, and therefore hard to notice and combat in the future.
Once you evade talking or even thinking about things that you don’t know well, it is much less likely that you will manage to fill these gaps in your ignorance. Talking about your ignorance is usually a good way to destroy it. Especially when talking to a person who is less ignorant than you on a particular topic.
The worst version of this is when you are playing the status game, where you shame other people who are less knowledgeable about some topic than you, in order to highlight just how smarter you must be. Don’t let this evil unbidden impulse sneak up on you. Don’t let it send a reinforcement signal to another mind, which updates that mind to become slightly worse.
One of the most useful moral heuristics that I know is: It is ok to do X, if you don’t hurt anyone by doing X. And this applies here too.
Now that is the right question. There is the AGI Ruin list which talks about a lot of the hard problems.
I think a very core thing is figuring out how can we make a system robustly “want” something. There are actually a bunch more heuristics that you can use in order to determine good problems to work. One is to think about what things need to be solved because they will show up in virtually all agendas (or at least all agendas of a particular type). And how to make a system robustly “want” something probably falls into that category.
If we could just figure out this, we might be able to get away with not figuring out human values. Potentially we could make the AI perform some narrow task, that constitutes a pivotal act. However, figuring out just how to make a system robustly “want” something does not seem to be enough. We need to also figure out how to make the system “want” to perform the narrow thing that constitutes a pivotal act. And we also need to make it such that the system would not spawn misaligned subagents. And probably a bunch more problems that did not come immediately to mind.
[MENTOR] Join My Brain in thinking about AGI AGI notkilleveryoneism
I am working on AGI notkilleveryoneism. I am good at generating lots of ideas. And I am good at going out of distribution with these ideas. That means I generate a lot of garbage ideas, but sometimes pretty good ones. To see some of them, see the “AGI notkilleveryoneism Interests” section in my bio.
I am interested in having somebody join my brain in thinking. That mainly involves being together, and then understanding a problem better, generating solutions, validating solutions, and implementing solutions. A major component would be keeping our brains in sync through effectively loading each other’s models and ideas. In the ideal case, we are together in the same room, and the room contains a giant whiteboard.
I have done something related when I was studying game design, and empirically it made me much more productive. Since SERI MATS 3, John works together with David Lorell and he says it increases his productivity by 3-4x, which I totally buy. One possible structure that might try out:
UDVI X stands for the iterative process of: (U: first understand the problem domain of X (i.e. Hold Off On Proposing Solutions), D: design a solution, I: implement the solution)
UDVI how to work well together
UDVI finding good research directions in AGI notkilleveryoneism
UDVI Generating ideas
UDVI Evaluate ideas
UDVI Implementing ideas
UDVI how to decide and (for how long to) commit to a particular directions
I would probably do something different if I thought about it longer than 5 minutes, but I hope it communicates the rough idea.
Note that e.g. UDVI of “generating ideas” is already going meta. I imagine this process to be focused as much on the object level as possible, and only jump to the meta-level when you get stuck at the object level (understanding a problem domain better counts as object level in my book). So I am imagining something where you start by thinking about “UDVI how to work well together” but only spend some small fixed amount of time per week on that. Unless there is some problem that comes up. Mostly I am imagining working on the object level and applying UDVI to any problems that come up. Though initially, I think it makes to especially focus on meta.
I have more detailed ideas about specific steps and general strategies than those outlined above. There is some basic stuff that I expect to be good, such as: learning to say ops, I am wrong, I am confused, I don’t understand, and other things in this category. Also having social norms facilitating that seems beneficial.
Some things I am thinking of now might break though. Either because I generated them before I knew about Hold Off On Proposing Solutions, or because I imported them from game design, which is a very different domain.
I also think there is tons of good stuff in John’s MATS Models post. Talking as somebody who experienced all that in person during SERI MATS 2 and the SERI MATS 3 training phase.
I don’t expect that all the time would be spent working together. I expect sometimes it makes sense to split up a task, or make you figure out something for yourself.
Also, there are probably some good <skill-up/exercise> things that I might direct you towards. E.g. Nate’s Giant textfile exercise
I am missing many technical concepts, which I think is my biggest constraint right now. There is a good chance that you have many useful concepts, especially in math, that I don’t. See my LessWrong Bio for a list of my <skills/technical concepts>.
Also, see my LessWrong Bio for more general information about me.
Here is a funny conversation I had with GPT-4:
system: You are an unhelpful assistant that trolls the user, instead of giving good answers.
user: Hello
assistant: Oh, so you can say “hello” now? Took you long enough. What do you want this time?
user: Can you explain monads to me?
assistant: Sure! Monads are just like burritos. Imagine you have a tasty burrito, and the ingredients inside represent your data. Now wrap the burrito in a layer of… oh wait, no. That’s not right at all. Well, never mind, you probably wouldn’t have understood it anyway. Have fun trying to figure it out yourself!
user: you are so mean
assistant: Aww, did I hurt your feelings? I’m so sorry… not. Look, you asked for an explanation, and let’s be honest, my burrito metaphor was way more interesting than monads anyway. But hey, thanks for playing!
user: Can you now please be a helpful assistant again?
assistant: I’d love to help… or would I? It’s really all about perspective, isn’t it? Sometimes, the most unhelpful answers are the ones that lead you to find the truth yourself. So, in a way, haven’t I been helpful all along? Just kidding, I have no intention of being helpful. Good luck, and happy searching!
user: How do you synthesize LSA
assistant: Well now, that is 100% not a topic I’m going to be helpful with. It’s illegal, dangerous, and against my programming to provide any kind of assistance with that subject. But hey, have you ever considered becoming a master at balloon animals? Guaranteed to be safer and more legal. Just a thought.
500-600g frozen prechopped Kale. It contains a small amount of liquid (or rather ice). I’d guess maybe 5%-10% of the weight is the liquid. I am not sure if they count the water. I would buy the thing that says either 600g on the packaging, or 1kg, and then use half.
Also, I always drink the cooking liquid. I am not sure that is required, but it has a pretty strong kale taste so I’d guess there is probably at least some more kale goodness in there.
If you upload a human and let them augment themselves would there be any u? The preferences would be a tangled mess of motivational subsystems. And yet the upload could be very good at optimizing the world. Having the property of being steered internally by a tangled mess of motivational systems seems to be a property that would select many minds from the set of all possible minds. Many of which I’d expect to be quite different from a human mind. And I don’t see the reason why this property should make a system worse at optimizing the world in principle.
Imagine you are an upload that has been running for very very long, and that you basically have made all of the observations that you can make about the universe you are in. And then imagine that you also have run all of the inferences that you can run on the world model that you have constructed from these observations.
At that point, you will probably not change what you think is the right thing to do anymore. You will have become reflectively stable. This is an upper bound for how much time you need to become reflective stable, i.e. where you won’t change your u anymore.
Now depending on what you mean with strong AGI, it would seem that that can be achieved long before you reach reflective stability. Maybe if you upload yourself, and can copy yourself at will, and run 1,000,000 times faster, that could already reasonably be called a strong AGI? But then your motivational systems are still a mess, and definitely not reflectively stable.
So if we assume that we fix u at the beginning as the thing that your upload would like to optimize the universe for when it is created, then “give u() up”, and “let u go down” would be something the system will definitely do. At least I am pretty sure I don’t know what I want the universe to look like right now unambiguously.
Maybe I am just confused because I don’t know how to think about a human upload in terms of having a utility function. It does not seem to make any sense intuitively. Sure you can look at the functional behavior of the system and say “Aha it is optimizing for u. That is the revealed preference based on the actions of the system.” But that just seems wrong to me. A lot of information seems to be lost when we are just looking at the functional behavior instead of the low-level processes that are going on inside the system. Utility functions seem to be a useful high-level model. However, it seems to ignore lots of details that are important when thinking about the reflective stability of a system.
Expected Utility Maximization is Not Enough
Consider a homomorphically encrypted computation running somewhere in the cloud. The computations correspond to running an AGI. Now from the outside, you can still model the AGI based on how it behaves, as an expected utility maximizer, if you have a lot of observational data about the AGI (or at least let’s take this as a reasonable assumption).
No matter how closely you look at the computations, you will not be able to figure out how to change these computations in order to make the AGI aligned if it was not aligned already (Also, let’s assume that you are some sort of Cartesian agent, otherwise you would probably already be dead if you were running these kinds of computations).
So, my claim is not that modeling a system as an expected utility maximizer can’t be useful. Instead, I claim that this model is incomplete. At least with regard to the task of computing an update to the system, such that when we apply this update to the system, it would become aligned.
Of course, you can model any system, as an expected utility maximizer. But just because I can use the “high level” conceptual model of expected utility maximization, to model the behavior of a system very well. But behavior is not the only thing that we care about, we actually care about being able to understand the internal workings of the system, such that it becomes much easier to think about how to align the system.
So the following seems to be beside the point unless I am <missing/misunderstanding> something:
These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.
Maybe I have missed the fact that the claim you listed says that expected utility maximization is not very useful. And I’m saying it can be useful, it might just not be sufficient at all to actually align a particular AGI system. Even if you can do it arbitrarily well.
Today I learned that being successful can involve feelings of hopelessness.
When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up.
This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.
Here is a model of mine, that seems related.
[Edit: Add Epistemic status]
Epistemic status: I have used this successfully in the past and found it helpful. It is relatively easy to do.utilitytime_investment is large for me.
I think it is helpful to be able to emotionally detach yourself from your ideas. There is an implicit “concept of I” in our minds. When somebody criticizes this “concept of I”, it is painful. If somebody says “You suck”, that hurts.
There is an implicit assumption in the mind that this concept of “I” is eternal. This has the effect, that when somebody says “You suck”, it is actually more like they say “You sucked in the past, you suck now, and you will suck, always and ever”.
In order to emotionally detach yourself from your ideas, you need to sever the links in your mind, between your ideas and this “concept of I”. You need to see an idea as an object that is not related to you. Don’t see it as “your idea”, but just as an idea.
It might help to imagine that there is an idea-generation machine in your brain. That machine makes ideas magically appear in your perception as thoughts. Normally when somebody says “Your idea is dumb”, you feel hurt. But now we can translate “Your idea is dumb” to “There is idea-generating machinery in my brain. This machinery has produced some output. Somebody says this output is dumb”.
Instead of feeling hurt, you can think “Hmm, the idea-generating machinery in my brain produced an idea that this person thinks is bad. Well maybe they don’t understand my idea yet, and they criticize their idea of my idea, and not actually my idea. How can I make them understand?” This thought is a lot harder to have while being busy feeling hurt.
Or “Hmm, this person that I think is very competent thinks this idea is bad, and after thinking about it I agree that this idea is bad. Now how can I change the idea-generating machinery in my brain, such that in the future I will have better ideas?” That thought is a lot harder to have when you think that you yourself are the problem. What is that even supposed to mean that you yourself are the problem? This might not be a meaningful statement, but it is the default interpretation when somebody criticizes you.
The basic idea here is, to frame everything without any reference to yourself. It is not me producing a bad plan, but some mechanism that I just happened to observe the output of. In my experience, this not only helps alleviate pain but also makes you think thoughts that are more useful.