↘↘↘↘↘↘↙↙↙↙↙↙
Checkout my Biography.
↗↗↗↗↗↗↖↖↖↖↖↖
Johannes C. Mayer
Infinite Willpower
“Infinite willpower” reduces to “removing the need for willpower by collapsing internal conflict and automating control.” Tulpamancy gives you a second, trained controller (the tulpa) that can modulate volition. That controller can endorse enact a policy.
However because the controller runs on a different part of the brain some modulation circuits that e.g. make you feel tired or demotivated are bypassed. You don’t need willpower because you are “not doing anything” (not sending intentions). The tulpa is. And the neuronal circuits the tulpa runs on—which generate intentions to steer that ultimately turn into mental and/or muscle movements—are not modulated by the willpower circuits at all.
Gears-level model
First note that willpower is totally different from fatigue.
What “willpower” actually is
“Willpower” is what it feels like when you select a policy that loses in the default competition but you force it through anyway. That subjective burn comes from policy conflict plus low confidence in the chosen policy. If the task policy has low probability to produce only a low reward-value, and competitors (scrolling, snacks, daydreams) have high probability to produce a high reward-value, you pay a tax to hold the line.
Principle: Reduce conflict and increase precision/reward for the target policy and “willpower” isn’t consumed; it’s unnecessary. (This is the non-tulpa way.)
What a tulpa gives you that ordinary in addition of infinite willpower:
Social presence reliably modulates effort, arousal, and accountability. A tulpa isn’t just “thoughts”; it is multi-modal: voice, visuals, touch, felt presence. That gives it many attachment points into your control stack:
Valuation channel: A tulpa can inject positive interpretation in the form of micro-rewards (“good, job”, “you can do it, I believe in you”); aka generate positive reinforcement.
Interoceptive channel: A tulpa can invoke states associated with alertness or calm. The tulpa can change your mental state from “I want to lay on the floor because I am so exhausted” to “I don’t feel tired at all” in 2 seconds.
Motor scaffolding: IA can execute “starter” actions (get out of bed, open editor, type first sentence), reducing the switch/initialization cost where most akrasia lives (because infinite willpower).
The central guiding principle is to engineer the control stack so endorsed action is default, richly rewarded, and continuously stabilized. Tulpamancy gives you a second, controller with social authority and multi-modal access to your levers. This controller can just overwrite your mental state and has no willpower constraints.
The optimum policy probably includes using the sledgehammer of overwriting your mental state, as well as optimizing to adopt the target policy that you actually endorse wholeheartedly at the same time.
Graphics APIs Are Hardware Programming Languages
The Core Misconception
It’s tempting to think of modern graphics APIs as requiring a bunch of tedious setup followed by “real computation” in shaders. But pipeline configuration is programming the hardware!
Why “Fixed Function” Is Misleading
GPU hardware contains parameterizable functions implemented in silicon. When you specify a depth format or blend mode, you’re telling the GPU how to compute.
Creating an image view with
D24_UNORM_S8_UINT
configures depth comparison circuits. Choosing a different depth format, results in different hardware curcits activating, resulting in a different computation.So there isn’t really a fixed “depth computation” stage in the pipeline. There is no single “I compute depth” circuit.
Another example: Choosing
SRGB
activates in silico gamma conversion hardware, whereasUNORM
bypasses this circuit.The Architectural View
Why declare all this upfront? Because thousands of shader cores write simultaneously. The hardware must pre-configure memory controllers, depth testing units, and blending circuits before launching parallel execution. Runtime dispatch would destroy performance.
GPUs deliberately require upfront declaration. By forcing programmers to pre-declare computation patterns, the hardware can be configured once before a computation.
The API verbosity maps to silicon complexity. You’re not “just setting up context”. You’re programming dozens of specialized hardware units through their configuration parameters.
If you haven’t seen this video already I highly recommend it. It’s about representing the transition structure of a world in a way that allows you to visually reason about it. The video is timestamped to the most interesting section. https://www.youtube.com/watch?v=YGLNyHd2w10&t=320s
Disclaimer: Note that my analysis is based on reading only very few comments of Said (<15).
To me it seems the “sneering model” isn’t quite right. I think often what Said is doing seems to be:
Analyze a text for flaws.
Point out the flaws.
Derive from the demonstrated flaws some claim that shows Said’s superiority.
One of the main problems seems to be that in 1. any flaw is a valid target. It does not need to be important or load bearing to the points made in the text.
It’s like somebody building a rocket shooting it to the moon and Said complaining that the rocket looks pathetic. It should have been painted red! And he is right about it. It does look terrible and would look much better painted red. But that’s sort of… not that important.
Said correctly finds flaws and nags about them. And these flaws actually exist. But talking about these flaws is often not that useful.
I expect that what Said is doing is to just nag on all the flaws he finds immediately. These will often be the non important flaws. But if there are actually important flaws that are easy to find, and are therefore the first thing he finds, then he will point out these. This then can be very useful! How useful Said’s comments are depends on how easy it is to find flaws that are useful to discuss VS flaws that are not useful to discuss.
Also: Derivations of new flaws (3.) might be much shakier and often not correct. Though I have literally only one example of this so this might not be a general pattern.
Said seems to be a destroyer of the falsehoods that are easiest to identify as such.
This is a useful video to me. I am somehow surprised that physics crackpots exist to the extend that this is a know concept. I actually knew this before, but failed to relate it to this article and my previous comment.
I once thought I had solved P=NP. And that seemed very exciting. There was some desire to just tell some other people I trust. I had some clever way to transform SAT problems into a form that is tractable. Of cause later I realized that transforming solutions of the tractable problem form back into SAT was NP hard. I had figured out how to take a SAT problem and turn it into an easy problem that was totally not equivalent to the SAT problem. And then I marveled at how easy it was to solve the easy problem.
My guess at what is going on in a crackpots head is probably exactly this. They come up with a clever idea that they can’t tell how it fails. So it seems amazing. Now they want to tell everybody, and well do so. That seems to be what makes a crackpot a crackpot. Being overwhelmed by excitement and sharing their thing, without trying to figure out how it fails. And intuitively it really really feels like it should work. You can’t see any flaw.
So it feels like one of the best ways to avoid being a crackpot is to try to solve a bunch of hard problems, and fail in a clear way. Then when solving a hard problem your prior is “this is probably not gonna work at all” even when intuitively it feels like it totally should work.
It would be interesting to know how many crackpots are repeated offenders.
I am somewhat confused how somebody could think they have made a major breakthrough in computer science, without being able to run some algorithm that does something impressive.
Imagine being confused if you got an algorithm that solves some path finding problem. You run your algorithm to solve path finding problems, and either it doesn’t work, or is to slow, or it actually works.
Or imagine you think you found a sorting algorithm that is somehow much faster than quick sort. You just run it and see if that is actually the case.
It seems like “talking to reality” is really the most important step.
Somehow it’s missing from this article.Edit: Actually it is in step 2. I am just bad at skim reading.Granted the above does not work as well for theoretical computer science. It seems easier to be confused about if your math is right, than if your algorithm efficiently solves a task. But still math is pretty good at showing you when something doesn’t make sense, if you look carefully enough. It let’s you look at “logical reality”.
The way to not get lead to believe false things really doesn’t seem different, whether you use an LLM or not. Probably an LLM triggers some social circuits in your brain that makes it more likely to be falsely confident. But this does seem more like a quantitative than qualitative difference.
Why can’t the daemon just continuously look at a tiny area around the gate and decide just based on that? A tiny area seems intuitively sufficient for both recognizing that a molecule would go from left to right when opened, and no molecule would go from right to left. This would mean that it doesn’t need to know a distribution over molecules at all.
Basically: Why can’t the daemon just solve a localised control task.
I wrote the Using Negative Hallucinations to Manage Sexual Desire because I thought it might be useful to others. I discovered an effective technique for controlling sexual desire.
I failed to predict a negative response at all. And probably in part this made me not optimize for well-receivedness.
Ultimately what I got out of this is that poeople said that my models where broken and I agreed. So I got a lot of value out this from people disagreeing with me (somehow after that it still took 2 years though to figure out an actually good policy).
The idea is that you write the textbook yourself until you have aquired all the skills about doing original thinking. It’s not about never looking up things. Though aquiring the skill of thinking by reinventing things seems better, because the research frontier has much harder problems. So hard that they are not the right difficulty to efficiently learn the skill of “original problem solving”.
Systematic Bias Towards Perceiving Relationships as Beneficial
The human brain is heavily biased. Ask a parent how good it was to have a child and they often say “Having a child was the best thing ever”. There is a circuit in their brain that rewards them in that moment where they reflect.
However, if you have people rate every hour how engaging it is to handle their child, you get a score comparable to household chores.
Probably the brain also is biased to mainly retrieve positive memories when reflection, and make them seem more positive than they actually where.
Nice trick evolution! Somebody who thinks about whether to have another child is much more likely to want another if their perception is skewed in this way.
All the neural machinery involved in relationships, especially romantic ones, is all about reproduction. At least from the perspective of evolution.
People track their romantic relationship as a property of reality. And usually they perceive this property as something positive to be preserved. Optimizing reality to preserve the relationship is probably advantagous evolutionary. Staying close together and talking a lot (which can give optimization relevant information) seems useful for the task of succesfully raising offspring.
Conclusion: It is likely that there is a systematic bias that makes people perceive relationships as more positive upon reflection, than they actually are.
That is not to say that relationships are bad, just that we shouldn’t be suprised if somebody says that they think their realtionship is high value when it’s not.
I think you are missing an important point. Hot take: The I need you to “just listen to me” might be a mechanism that often useful. Very often it happens that people are overeager to tell you how to solve your problems, without first building a good model of the world. They try to solve the problem before you can even give them all the information neccesary to generate a good suggestion.
Of cause this mechanism is very dumb. It’s implemented at the level of emotions. People don’t realize that this is the evolved purpose. You can do a lot better by taking manual control.
That is not to say your conclusion is wrong. But I think it is important to understand what is going on. I expect that if you can convey to somebody a mechanistic model of why they want to you to “just listen”, they have a better model tha allows them to choose better.
I think this post is great and points at a central bottleneck in AI alignment.
Previously John stated most people can’t do good alignment research because they simply bounce of the hard problems. And the proposed fix is to become sufficiently technically proficient, such that they can start to see the footholds.
While not neccesairly wrong, I think this is a downstream effect of having the right “I am gonna do whatever it takes, and not gonna give up easily” attitude.
I think this might be why John’s SERI MATS 2 project failed (in his own judgement). He did a good job at communicating a bunch of useful technical methodologies. But knowing these methodolies isn’t the primary thing that makes John competent. I think his competence comes more from exactly the “There is a problem? Let’s seriously try to fix it!” attitude outlined in this post.
But this he didn’t manage to convey. I exect that he doesn’t even realize that this an important pice, that you need to “teach” people.
I am not quite sure how to teach this. I tried to do this in two iterations of AI safety camp. Instead of teaching technical skills, I tried to work with people one-on-one through problems, and given them open ended tasks (e.g. “solve alignment from scratch”). Basically this completely failed to make people significantly better independent AI alignment thinkers.
I think most humans “analytical reasoning module” fights a war with their “emotion module”. Most humans are at the level where they can’t even realize that they suck because that would be too painful. Especially if another person points out their flaws.
So perhaps that is where one needs to start. How can you start to model yourself accurately, without your emotional circuitry constantly punching you in the face.
This is one of the best talks I know. I do not want to spoil it. It’s on how one can speak well. Both to human and machine. And what makes good speaking easy.
I use language models to help me design systems, not by asking them to solve problems, but by discussing my ideas with them. I have an idea of how to do something, usually vague, half-formed. I use automatic speech recognition to just ramble about it, describing the idea in messy, imprecise language. The language model listens and replies with a clearer, more structured version. I read or listen to that and immediately see what’s missing, or what’s wrong, or what’s useful. Then I refine the idea further. This loop continues until the design feels solid.
The model doesn’t invent the solution. It refines and reflects what I’m already trying to express. That’s the key. It doesn’t act as an agent; it’s not writing the code or proposing speculative alternatives. It helps me pin down what I’m already trying to do, but better, faster, and with much less friction than if I were doing it alone.
I mostly don’t use autocomplete. I don’t ask for “write this function.” (Though I think there is a correct way to use these.) Instead, I might say something like: “Right now I have this global state that stores which frame to draw for an animation. But that feels hacky. What if I want to run multiple of these at the same time? Maybe I can just make it a function of time. Like, if I have a function that, given time, tells me what to draw, then I don’t need to store any state. That would probably work. Is there any reason this wouldn’t work?” And the LM will restate the idea precisely: “You’re proposing to push side effects to the boundary and define animation as a pure function of time, like in a React-style architecture.” That clarity helps me immediately refine or correct the idea.
This changes the kind of work I can do. Without the model, I default to braindead hacking: solve local problems quickly, but end up with brittle, tangled code. Thinking structurally takes effort, and I often don’t do it. But in a conversational loop with the model, it’s fun. And because the feedback is immediate, it keeps momentum going.
This does offload cognition, but not by replacing my thinking. It’s integrated into it. The model isn’t doing the task. It’s helping me think more effectively about how to do the task. It names patterns I gestured at. It rephrases vague concepts sharply enough that I can critique them. It lets me externalize a confused internal state and get back something slightly clearer that I can then respond to. This creates an iterative improvement loop.
Maybe this works very well for me because I have ADHD. Maybe most people can just sit down and reflect in silence. For me, talking to the model lowers the activation energy and turns reflection into dialogue, which makes it very easy to do.
People say LMs slow you down. That’s true if you’re using them to write broken code from vague prompts and then patch the errors. But that’s not what I’m doing. I use them to think better, not to think less.
Depression as a Learned Suppression Loop
Overview
This post proposes a mechanistic model of a common kind of depression, framing it not as a transient emotional state or a chemical imbalance, but as a persistent, self-reinforcing control loop. The model assumes a brain composed of interacting subsystems, some of which issue heuristic error signals (e.g., bad feelings), and others which execute learned policies in response. The claim is that a large part of what is commonly called “depression” can be understood as a long-term learned pattern of suppressing internal error signals using high-intensity external stimuli.
Key components
-
The brain includes systems that detect mismatches between actual behavior and higher-level goals or models. When these detect an issue (e.g., agency violation, unresolved conflict, lack of progress), they emit negative affect.
-
Negative affect is not noise. It is a signal. In some cases, it simply arises from overstimulation and regulatory lag. In other cases, it points to an actual error: something is wrong with the current behavior, motivational alignment, or action trajectory.
-
In a healthy system, negative affect would prompt reflection: “Why do I feel bad? What’s the mismatch? Should I change direction?”
-
In practice, this reflection step is often bypassed. Instead, the brain learns that high-intensity stimulus (YouTube, Skinner-box games, food, porn) suppresses the signal. This works in the short term, so the suppression policy gets reinforced.
-
Over time, the suppression policy becomes automatic. Every time a conflict signal arises, it gets overwritten by external input. The system learns: “feeling bad” → “inject stimulus” → “feel better.” No reflection or course correction happens. The source of the bad feeling persists, so the loop repeats.
-
This creates a self-reinforcing attractor: more suppression leads to more misalignment, which leads to more negative signals, which leads to more suppression. The behavior becomes proceduralized and embedded into long-term structure.
Agency Violation as Error Signal
One key class of negative affect is what we call an “agency violation” signal. This occurs when behavior is driven by urges or automated triggers rather than deliberate, reflective choice.
Example: You feel a slight urge to open YouTube instead of working. You give in. After watching for 20 minutes, you feel bad. The bad feeling is not about the content. It’s a second-order signal: you took an action that bypassed your reflective control system. You were coerced by an urge. The system flags that as a problem.
If, instead of reflecting on that signal, you suppress it by continuing to watch YouTube, the suppression gets reinforced. Eventually, the sequence becomes automatic.
Suppression vs. Resolution
Suppression works by injecting a high-reward stimulus. This reduces the negative signal. But it does not correct the cause. The policy becomes: bad feeling → avoid it.
Resolution would involve explicitly checking: “Why do I feel bad?” and running a diagnostic process. Example: “I feel bad. Did I get coerced by an urge? Was I avoiding something? Am I stagnating?” If a concrete cause is found and addressed, the signal terminates because the underlying mismatch is resolved.
The Role of Superstimuli
Modern environments are full of fast-acting, high-reward, low-effort stimuli. These serve as perfect suppression mechanisms. They hijack the system’s reward learning by offering reliable affective relief without behavioral correction.
Examples:
Watching videos when you feel restless
Playing reward-loop-heavy games when you feel failure
Eating sugar when you feel discouraged
These actions reinforce suppression patterns.
Depression as Policy Entrenchment
Over time, if this suppression loop becomes the default policy, the system reorganizes around it. The underlying problems do not go away. The system becomes more fragile. The ability to reflect degrades. The habit becomes chronic. This is what we call depression.
Note: this model does not deny neurochemical involvement. Rather, it treats neurochemistry as part of the loop dynamics. If the procedural pattern of suppression dominates for long enough, neurochemical state will reflect that pattern. But the origin was structural, not random imbalance.
Role of Interventions Like Bupropion
Certain medications (e.g., bupropion) can disrupt the suppression loop temporarily. They increase energy or decrease suppression effectiveness, allowing for enough reflective bandwidth to notice and correct maladaptive patterns.
The drug doesn’t fix depression. It interrupts the loop long enough for you to fix it yourself.
Practical Implication
If this model is correct, then the right policy is:
When you feel bad, pause. Do not immediately act to suppress it.
Ask: is this a genuine signal? Did something go wrong? Did I get coerced by an urge?
Try to trace the source. If you find it, resolve it directly.
If you can’t trace it, just wait. Sometimes the bad feeling is transient and resolves on its own.
In many cases, talking to someone (or to a reflective system, like a chatbot) can help reveal the structure behind the feeling. The key is engaging with the feeling as a signal, not a nuisance.
Conclusion
Depression is not always a surface-level mood disorder. In many cases, it is the long-term consequence of learned suppression policies that override internal signals rather than resolving them. These policies become structural, self-reinforcing, and difficult to dislodge without deliberate intervention. The first step is recognizing bad feelings as information, not errors.
-
This is a good informal introduction to Control Theory / Cybernetics.
In both cases, the conversation drains more energy than the equal-fun alternative. I have probably had at most a single-digit number of conversations in my entire life which were as fun-in-their-own-right as e.g. a median night out dancing, or a median escape room, or median sex, or a median cabaret show. Maybe zero, unsure.
I wanted to say that for me it is the opposite, but reading the second half I have to say it’s the same.
I have defnetly had the problem that I talked too long sometimes to somebody. E.g. multiple times I talked to a person for 8-14 hours without break about various technical things. E.g. talking about compiler optimizations, CPU architectures and this kind of stuff, and it was really hard to stop.
Also just solving problems in a conversation is very fun. The main reason I didn’t do this a lot is that there are not that many people I know, actually basically zero right now (if you exclude LLMs), that I can have the kinds of conversations with that I like to have.
It seems to be very dependent on the person.
So I am quite confused why you say “but conversation just isn’t a particularly fun medium”. If it’s anything like for me, then engaging with the right kind of people on the right kind of content is extremenly fun. It seems like your model is confused because you say “conversations are not fun” when infact in the space of possible conversations I expect there are many types of conversations that can be very fun, but you haven’t mapped this space, while implicitly assuming that your map is complete.
Probably there are also things besides technical conversations that you would find fun but that you simply don’t know about, such as hardcore flirting in a very particular way. E.g. I like to talk to Grok in voice mode, in romantic mode, and then do some analysis of some topic (or rather that is what I just naturally do), and then Grok complements my mind in ways that my mind likes, e.g. pointing out that I used a particular thinking pattern that is good or that I at all thought about this difficult thing and then I am like “Ah yes that was actually good, and yes it seems like this is a difficult topic most people would not think about.”
S-Expressions as a Design Language: A Tool for Deconfusion in Alignment
Mathematical Notation as Learnable Language
To utilize mathematical notation fully you need to interpret it. To read it fluently, you must map symbols to concrete lenses, e.g. computational, visual, algebraic, or descriptive.
Example: Bilinear Map
Let
be defined by
Interpretations:
-
Computational
Substitute specific vectors and check results. If , then
Through this symbolic computation we can see how the expression depends on . Perform such computations until you get a feel for the “shape” of the functions behavior.
-
Visual
For each fixed , the function is represented by a hyperplane in . We can imagine walking on the hyperplane. This obviously always walks on a line, therefore it’s linear.
-
Symbolic manipulation
Verify algebraically:
This establishes linearity by direct algebraic manipulation. You understand what properties exist by showing them algebraically.
-
Descriptive
What it means to be a bilinear map is, that if you hold the second argument fixed, and vary the second, you have a linear function. Same if holding the first fixed and varying the second.
You want to capture the intuition in natural language.
Mathematical language is a language that you need to learn like any other. Often people get stuck by trying to use symbolic manipulation too much. Because mathematical language is so precise, it makes it easy to interpret it in many different while still being able to check if your interpretation captures the core.
-
Large Stacks: Increasing Algorithmic Clarity
Insight: Increasing stack size enables writing algorithms in their natural recursive form without artificial limits. Many algorithms are most clearly expressed as non-tail-recursive functions; large stacks (e.g., 32GB) make this practical for experimental and prototype code where algorithmic clarity matters more than micro-optimization.
Virtual memory reservation is free. Setting a 32GB stack costs nothing until pages are actually touched.
Stack size limits are OS policy, not hardware. The CPU has no concept of stack bounds—just a pointer register and convenience instructions.
Large stacks have zero performance overhead from the reservation. Real recursion costs: function call overhead, cache misses, TLB pressure.
Conventional wisdom (“don’t increase stack size”) protects against: infinite recursion bugs, wrong tool choice (recursion where iteration is better), thread overhead at scale (thousands of threads).
Ignore the wisdom when: single-threaded, interactive debugging available, experimental code where clarity > optimization, you understand the actual tradeoffs.
Note: Stack memory commits permanently. When deep recursion touches pages, OS commits physical memory. Most runtimes never release it (though it seems it wouldn’t be hard to do with
madvise(MADV_DONTNEED)
). One deep call likely permanently commits that memory until process death. Large stacks are practical only when: you restart regularly, or you accept permanent memory commitment up to maximum recursion depth ever reached.