I’m somewhat sympathetic to this. You probably don’t need the ability, prior to working on AI safety, to already be familiar with a wide variety of mathematics used in ML, by MIRI, etc.. To be specific, I wouldn’t be much concerned if you didn’t know category theory, more than basic linear algebra, how to solve differential equations, how to integrate together probability distributions, or even multivariate calculus prior to starting on AI safety work, but I would be concerned if you didn’t have deep experience with writing mathematical proofs beyond high school geometry (although I hear these days they teach geometry differently than I learned it—by re-deriving everything in Elements), say the kind of experience you would get from studying graduate level algebra, topology, measure theory, combinatorics, etc..
This might also be a bit of motivated reasoning on my part, to reflect Dagon’s comments, since I’ve not gone back to study category theory since I didn’t learn it in school and I haven’t had specific need for it, but my experience has been that having solid foundations in mathematical reasoning and proof writing is what’s most valuable. The rest can, as you say, be learned lazily, since your needs will become apparent and you’ll have enough mathematical fluency to find and pursue those fields of mathematics you may discover you need to know.
Holding that beliefs are for true things means that you do not believe things because they are useful, believe things because they sound nice, or believe things because you prefer them to be true. You believe things that are true (or at least that you believe to be true, which is often the best we can get!).
This is maybe a subtle objection, but I disagree with the implicit rejection of utility in favor of truth being set up here. Truth is very attractive to us, and I think this runs deep for reasons that don’t much matter here but on which I’ll just say I think it’s because we’re fundamentally prediction error minimizers (with some homeostatic feedback loops thrown in for survival and reproduction purposes). But if I had to justify why truth is important, I would say it’s because it’s useful. If truth were somehow not causally upstream of making accurate predictions about the world (or maybe that’s just what truth means), I don’t think I would care about it, because making accurate predictions about the world is really useful to getting all the other things I care about done.
Yes, there is a danger that befalls some people when they prize utility too far above truth that biases them in subtle and gross ways that lead them astray and actually work against them by making them less serve their purposes when they’re not looking, but there are similar dangers when people pursue truth at the expense of usefulness, mostly in the form of opportunity costs. I think we all at some point must learn to prize truth over motivated reasoning and preferences, for example, but I also think we must learn to prize the utility of truth over truth itself lest we be enthralled by the Beast of Scrupulosity.
I also think it’s reasonable to think that multiple things may be doing on that result in a theory of mental energy. For example, hypotheses 1 and 2 could both be true and result in different causes of similar behavior. I bring this up because I think of those as two different things in my experience: being “full up” and needing to allow time for memory consolidation where I can still force my attention it just doesn’t take in new information vs. being unable to force the direction of attention generally.
Maybe it’s because we live in a world full of these “adjectives from the future”, but when I think of, for example, a “weight-loss program” I don’t think the program will result in weight loss, but rather a program whose purpose is weight loss, whether or not it achieves it. Similarly with the other examples: the adjective is not describing what it will do, but what the intended purpose is.
Right, it does seem that we have found ways, being bounded and irrational agents, to get closer to rationality by using our boundedness to protect ourselves from our irrationality (and vice versa!).
This seems to be a case of using boundedness in the form of not being precise, maintaining uncertainty that is not resolved until the last moment, and also probably exhaustion (if you try to lead me through a pump after a few steps I’ll give up before you can take too much advantage of me) to avoid bad results of maximizing on irrational preferences.
The opposite would be using irrationality to deal with boundedness, such as keeping things vague so we can sometimes still do the right thing even when we’ve made a mistake in our reasoning about our preferences.
As I recall Kahneman is somewhat careful to avoid presenting S1/S2 as part of a dual process theory, and in doing so naturally cuts off some of the chance to turn around and use S2 causally upstream of the things he describes. I think you are correctly seeing the Kahneman is very careful in how he writes, such that S1/S2 are not gears in his model so much as post hoc patterns that act as nice referents to, in his model, isolated behaviors that share certain traits without having to propose a unifying causal mechanism.
Nonetheless, I think we can identify S2 roughly with the neocortex and S1 roughly with the rest of the brain, and understand S1/S2 behaviors as those primarily driven by activity in those parts of the brain. Kahneman just is careful, in my recollection, to avoid saying things like that because there’s no hard proof for it, just inference.
Mostly I think of it in terms of predictions and their errors. In this example I expected/predicted the world would look one way and then it looked another, and when it looked another that seems to have triggered a cascade of prediction errors that resulted in a process to try to construct new predictions that also dredged up old evidence from memory to be reconsidered.
I think ML methods are insufficient for producing AGI, and getting to AGI will require one or more changes in paradigm before we have a set of tools that will look like they can produce AGI. From what I can tell the ML community is not working on this, and instead prefer incremental enhancements to existing algorithms.
Basically what I view as needed to make AGI work might be summarized as needing to design dynamic feedback networks with memory that support online learning. What we mostly see out of ML these days are feedforward networks with offline learning that are static in execution and often manage to work without memory, though some do have this. My impression is that existing ML algorithms are unstable under these kinds of conditions. I expect something like neural networks will be part of making it to AGI, and so some current ML research will matter, but mostly we should think of current ML research as being about near-term, narrow applications rather than on the road to AGI.
That’s at least my opinion based on my understanding of how consciousness works, my belief that “general” requires consciousness, and my understanding of the current state of ML and what it does and does not do that could support consciousness.
Based on my reading of the post it seemed to me that you were concerned primarily with info-hazard risks in ML research, not AI research in general; maybe it’s the way you framed it that I took it to be contingent on ML mattering.
So long as shortform is salient for me, might as well do another one on a novel (in that I’ve not heard/seen anyone express it before) idea I have about perceptual control theory, minimization of prediction error/confusion, free energy, and Buddhism that I was recently reminded of.
There is a notion within Mahayana Buddhism of the three poisons: ignorance, attachment (or, I think we could better term this here, attraction, for reasons that will become clear), and aversion. This is part of one model of where suffering arises from. Others express these notions in other ways, but I want to focus on this way of talking about these root kleshas (defilements, afflictions, mind poisons) because I think it has a clear tie in with this other thing that excites me, the idea that the primary thing that neurons seek to do is minimize prediction error.
Ignorance, even among the three poisons, is generally considered more fundamental, in that ignorance appears first and it gives rise to attraction and aversion (in some models there is fundamental ignorance that gives rise to the three poisons, marking a separation between ignorance as mental activity and ignorance as a result of the physical embodiment of information transfer). This looks to me a lot like what perceptual control theory predicts if the thing being controlled for is minimization of prediction error: there is confusion about the state of the world, information comes in, and this sends a signal within the control system of neurons to either up or down regulate something. Essentially what the three poisons describe is what you would expect the world to look like if the mind were powered by control systems trying to minimize confusion/ignorance, nudging the system toward and away from a set point where prediction error is minimized via negative feedback (and a small bonus, this might help explain why the brain doesn’t tend to get into long-lasting positive feedback loops: it’s not constructed for it and before long you trigger something else to down-regulate because you violate its predictions).
It also makes a lot of sense that these would be the root poisons. I think we can forgive 1st millennium Buddhists for not discovering PCT or minimization of prediction error directly, but we should not be surprised that they identified the mental actions this theory predicts should be foundational to the mind and also recognized that they were foundational actions to all others. Elsewhere, Buddhism explicitly calls out ignorance as the fundamental force driving dukkha (suffering), though we probably shouldn’t assign too many points to (non-Madhyamaka) Buddhism for noticing this since other Buddhist theories don’t make this same claims about attachment and aversion and they are used concurrently in explication of the dharma.
The production rule model is interesting to me in that it fits well with Michael Commons’ notion of how developmental psychology works. Specifically, Commons has a formal version of his theory that looks a lot like what developmental psychology is about is how humans learn new ways to perform more “complex” production rules in that they are the same sort of rules operating on more complex types.
One complicating factor is how much you believe ML contributes to existential threats. For example, I think the current ML community is very unlikely to ever produce AGI (<10%) and that AGI will be the result of break throughs from researchers in other parts of AI, thus it seems not very important to me what current ML researchers think of long-term safety concerns. Other analyses of the situation would result in concluding differently, though, so this seems like an upstream question that must be addressed or at least contingently decided upon before evaluating how much it would make sense to pursue this line of inquiry.
I have plans to write this up more fully as a longer post explaining the broader ideas with visuals, but I thought I would highlight one that is pretty interesting and try out the new shortform feature at the same time! As such, this is not optimized for readability, has no links, and I don’t try to backup my claims. You’ve been warned!
Suppose you frequently found yourself identifying with and feeling like you were a homunculus controlling your body and mind: there’s a real you buried inside, and it’s in the driver’s seat. Sometimes your mind and body do what “you” want, sometimes it doesn’t and this is frustrating. Plenty of folks reify this in slightly different ways: rider and elephant, monkey and machine, prisoner in cave (or audience member in theater), and, to a certain extent, variations on the S1/S2 model. In fact, I would propose this is a kind of dual process theory of mind that has you identifying with one of the processes.
A few claims.
First, this is a kind of constant, low-level dissociation. It’s not the kind of high-intensity dissociation we often think of when we use that term, but it’s still a separation of sense of self from the physical embodiment of self.
Second, this is projection, and thus a psychological problem in need of resolving. There’s nothing good about thinking of yourself this way; it’s a confusion that may be temporarily helpful but it’s also something you need to learn to move beyond via first reintegrating the separated sense of self and mind/body.
Third, people drawn to the rationalist community are unusually likely to be the sort of folks who dissociate and identify with the homunculus, S2, the rider, far mode, or whatever you want to call it. It gives them a world view that says “ah, yes, I know what’s right, but for some reason by stupid brain doesn’t do what I want, so let’s learn how to make it do what I want” when this is in fact a confusion because it’s the very brain that’s “stupid” that’s producing the feeling that you think you know what you want!
To speculate a bit, this might help explain some of the rationalist/meta-rationalist divide: rationalists are still dissociating, meta-rationalists have already reintegrated, and as a result we care about very different things and look at the world differently because of it. That’s very speculative, though, and I have nothing other than weak evidence to back it up.
This is a great point that I think sometimes gets lost on folks, which is why it’s good that you bring it up. To the extent I disagree with you on your research agenda, for example, it’s disagreement over what model we use to describe reality that will be useful to our purposes, rather than disagreement over reality itself.
Good example: the US tried to go metric and then canceled its commitment.
The problem is that the preferences are conditional on internal state; they can’t be captured only by looking at the external environment.
I think I wasn’t clear enough about what I meant. I mean to question specifically why excluding such so-called “internal” state is the right choice. Yes, it’s difficult and inconvenient to work with that which we cannot externally observe, but I think much of the problem is that our models leave this part of the world out because it can’t be easily observed with sufficient fidelity (yet). The division between internal and external is somewhat arbitrary in that it exists at the limit of our observation powers, not generally as a natural limit of the system independent of our knowledge of it, so I question whether it makes sense to then allow that limit to determine the model we use, rather than stepping back and finding a way to make the model larger such that it can include the epistemological limits that create partial preferences as a consequence rather than being ontologically basic to the model.
With regards to path dependence and partial preferences, a certain amount of this feels like the model simply failing to fully capture the preference on the first go. That is, preferences are conditional, i.e. they are conditioned on the environment in which they are embedded, and the sense in which there is partiality and path dependence issues seems to me to arise entirely from partial specification, not the preference being partial itself. Thus I have to wonder, why pursue models that deal with partial preferences and their issues rather than trying to build better models of preferences that better capture the full complexity of preferences?
To a certain extent it feels to me like with partial preferences we’re trying to hang on to some things that were convenient about older models while dealing with the complexities of reality they failed to adequately model, rather than giving up our hope to patch the old models and look for something better suited to what we are trying to model (yes, I’m revealing my own preference here for new models based on what we learned from old models instead of incrementally improving old models).
As a general point, I consider it worth writing things that tackle an object level issue and show how mistake theory reasoning concludes something different than conflict theory reasoning and how that is different. I say that because I think most people are at least a little bit conflict theorists. Maybe not about everything, but at least sometimes for many people there will be times they think in terms of conflict, of us vs. them, of in-group against out-group. And having someone provide a well-reasoned, thoughtful, and generous-to-opponents essay nudging folks towards mistake theory by showing how it really works on the margin turns folks into being more strongly mistake theorists or using mistake theory more often.
My strong claim would be that humans start out conflict theorists—it’s our “natural” state—and it’s only through people showing us another way is it possible that we can come to another position. Yes, any writing like this piece by Scott can be used as fuel for reinforcing a conflict theory perspective in some people, but these are also the people who are likely so strongly conflict theorists that all evidence reinforces their position and there’s no marginal difference from producing something like Scott’s piece or something less charitable, while it does a lot to move people towards a mistake theory perspective, even if just on the object level issue addressed, and repeated exposure to such people can turn them into net mistake theorists.
Could some ideal person have done more to convert more conflict theorists to mistake theory on at least this issue in an essay than Scott did in his? Maybe. But I’m sure Scott did the best he could, and I think it’s on net better that he wrote this than not.
This has similarly been my approach. As best I can tell writing papers for academic publication is nice but, especially in the AI safety space, is not really the best way to convey and discuss ideas. Much more important seems to be being part of the conversation about technical ideas, learning from it, and adding to it so others can do the same. I put some small amount of effort into things outside FP mostly because I believe it’s a good idea for reputation effects and spreading ideas outside the forum bubble, but not because I think it’s the best way to make intellectual progress.
It’s also nice because the feedback loops are shorter. I can comment on a post or write my own, have a discussion, and then within weeks see the ripples of that discussion influencing other discussions. It helps me feel the impact I’m having, and motivates me to keep going.
Probably the only thing superior in my mind is doing practical work, e.g. building systems that test out ideas. Unfortunately many of the ideas we talk about in safety are currently ahead of the tech so we don’t know how to build things yet (and for safety sake I think it’s fine to not push on that too hard since I expect it will come on its own anyway), so until we are closer to AGI forum participation is likely one of the high impact activities one can engage in (I’m similarly positive about doing the face-to-face equivalent of talking at conferences and having conversations with interested folks).