This is part of the problem I was trying to describe in multi-agent minds, part “what are we aligning the AI with”.
I agree the goal is under-specified. With regard to meta-preferences, with some simplification it seems we have several basic possibilities
1. Align with the result of the internal aggregation (e.g. observe what does the corporation do)
2. Align with the result of the internal aggregation, by asking (e.g. ask the corporation via some official channel, let the sub-agents sort it out inside)
3. Learn about the sub-agents and try to incorporate their values (e.g. learn about the humans in the corporation)
4. Add layers of indirection, e.g. asking about meta-preferences
Unfortunately, I can imagine in case of humans, 4. can lead to various stable reflective equilibria of preferences and meta-preferences—for example, I can imagine, by suitable queries, you can get a human to want
to be aligned with explicit reasoning, putting most value on some conscious, model-based part of the mind; with meta-reasoning about VNM axioms, etc.
to be aligned with some heart&soul, putting value on universal love, transcendent joy, and the many parts of human mind which are not explicit, etc.
where both of these options would be self-consistently aligned with the meta-preferences the human will be expressing about how the sub-agent alignment should be done.
So even with meta-preferences, likely there are multiple ways
There is a fascinating not yet really explored territory between the GWT and predictive processing.
For example how it may look: there is a paper on Dynamic interactions between top-down expectations and conscious from 2018, where they do experiments in the “blink of mind” style and prediction, and discover, for example
The first question that we addressed was how prior information about the identity of an upcoming stimulus influences the likelihood of that stimulus entering conscious awareness. Using a novel attentional blink paradigm in which the identity of T1 cued the likelihood of the identity of T2, we showed that stimuli that confirm our expectation have a higher likelihood of gaining access to conscious awareness
Second, nonconscious violations of conscious expectations are registered in the human brain Third, however, expectations need to be implemented consciously to subsequently modulate conscious access. These results suggest a differential role of conscious awareness in the hierarchy of predictive processing, in which the active implementation of top-down expectations requires conscious awareness, whereas a conscious expectation and a nonconscious stimulus can interact to generate prediction errors. How these nonconscious prediction errors are used for updating future behavior and shaping trial-by-trial learning is a matter for future experimentation.
My rough takeaway is this: while on surface it may seem that effect of unconscious processing and decision-making is relatively weak, the unconscious processing is responsible for what even gets the conscious awareness. In the FBI metaphor, there is a lot of power in the FBI’s ability to shape what even get’s on the agenda.
The second thing first: “...but before they were physics terms they were concepts for intuitive things” is actually not true in this case: momentum did not mean anything, before being coined in physics. Than, it become used in a metaphorical way, but mostly congruently with the original physics concepts, as something like “mass”x”velocity”. It seems to me easy to imagine vivid pictures based of this metaphor, like advancing army conquering mile after mile of enemy territory having a momentum, or a scholar going through page after page of a difficult text. However, this concept is not tied to the b∗x term (which is one of my cruxes).
To me, the original metaphorical meaning of momentum makes a lot of sense: you have a lot of systems where you have something like mass (closely connected to inertia: you need great force to get something massive to move) and something like velocity—direction and speed where the system is heading. I would expect most people have this on some level.
Now, to the first thing second: I agree that it may be useful to notice all the systems in which the Taylor series for f has b>0, ESPECIALLY when it’s comparably easy to control f via b∗x rather than just a. However, some of the examples in the original post do not match this pattern: some could be just systems where, for example, you insert heavy-tailed distribution on the input, and you get heavy-tailed distribution on the output, or systems where the a term is what you should control, or systems where you should actually understand more about f(x) than the fact that is is has positive first derivative at some point.
What should be a good name for b∗x>0 I don’t know, some random prosaic ideas are snowballing, compound, faenus (from latin interest on money, gains, profit, advantage), compound interest. But likely there are is some more poetic name, similarly to Moloch.
1. Going through two of the adjacent links in the same paragraph:
With the trees, I only skimmed it, but if I get it correctly, the linked article proposes this new hypothesis: Together these pieces of evidence point to a new hypothesis: Small-scale, gap-generating disturbances maintain power-function size structure whereas later-successional forest patches are responsible for deviations in the high tail.
and, also from the paper
Current theories explaining the consistency of tropical forest size structure are controversial. Explanations based on scaling up individual metabolic rates are criticized for ignoring the importance of asymmetric competition for light in causing variation in dynamic rates. Other theories, which embrace competition and scale individual tree vital rates through an assumption of demographic equilibrium, are criticized for lacking parsimony, because predictions rely on site-level, size-specific parameterization
(I also recommend looking on the plots with the “power law”, which are of the usual type of approximating something more complex with a straight line in some interval.)
So, what we actually have in this: apparently different researchers proposing different hypothesis to explain the observed power-law-like data. It is far from conclusive what the actual reason is. As something like positive feedback loops is quite obvious part of the hypothesis space if you see power-law-like data, you are almost guaranteed to find a paper which proposes something in that direction. However, note that article actually criticizes previous explanations based more on “Matthews effect”, and proposes disturbances as a critical part of the explanation.
(Btw I do not claim any dishonesty from the author anything like that.)
Something similar can be said about the Cambrian explosion which is the next link.
Halo and Horn effects are likely evolutionary adaptive effects, tracking something real (traits like “having an ugly face” and “having higher probability of ending up in trouble” are likely correlated—the common cause can be mutation load / parasite load; you have things like the positive manifold).
And so on.
Sorry but I will not dissect every paragraph of the article in this way. (Also it seems a bit futile, as if I dig into specific examples, it will be interpreted as nit-picking)
2. Last attempt to gesture toward whats wrong with this whole. The best approximation of the cluster of phenomena the article is pointing toward is not “preferential attachment” (as you propose), but something broader—“systems with feedback loops which can be in some approximation described by the differential equation dx = b.x”.
You can start to see systems like that everywhere, and get a sense of something deep, explaining life, universe and everything.
One problem with this: if you have a system described by a differential equation of the form “dx = f(x,..)”, and the function f() is reasonable, you can approximate it by its Taylor series “f(x)=a+b.x+c.x.x+..“. Obviously, the first order term is b.x. Unfortunately (?) you can say this even before looking on the system.
So, vaguely speaking, when you start thinking in this way, my intuition is it puts you in a big danger of conflating something about how you do approximations with causal explanations. (I guess it may be a good deal for many people who don’t have s-1 intuitions for Taylor series or even log() function)
I’m still confused what you mean by momentum-like effects. Momentum is a very beautiful and crisp concept - the dual (canonical conjugate) of position, with all kinds of deep connections to everything. You can view the whole universe in the dual momentum space.
If the intention is to have a concept ca in the shape of “all kinds of dynamics which can be rounded to dx=a.x” I agree it may be valuable to have a word for that, but why overload momentum?
You asked for an example of where it conflates that causal mechanism with something else. I picked one example from this paragraph
There’s also the height of trees, the colour, brightness, and lifetime of stars, the proliferation of species, the halo and horns effect, affective death spirals, and the existence of life itself.
So, as I understand it, I gave you an example (the distribution of star masses) which quite likely does not have any useful connection to preferential attachment or exponential grow. I’m really confused after your last reply what is the state of our disagreement on this.
I’m actually scared to change the topic of the discussion what simplicity means, but the argument is roughly this: if you have arbitrary well behaved function, in the linear picture, you can approximate it locally by a straight line (the first term in the Taylor series, etc.). And yes, you get better approximation by including more terms from the Taylor series expansion, or by non-linear regression, etc. Now, if you translate this to the log-log picture, you will find out that power law is in some sense the simplest local approximation of anything. This is also the reason why people often mistakenly use power laws instead of lognormal and other distributions—if you truncate the lognormal and look just on part of the tail you can fit it with power law. Btw you nicely demonstrate this effect yourself—preferential attachment often actually leads to Yule-Simon distribution, and not a power law … but as usually you can approximate it.
I don’t know what you mean by attachment style, but some examples of the conflation...
Momentum is this: even if JK Rowling’s next book is total crap, it will still sell a lot of copies. Because people have beliefs, and because they enjoyed her previous books, they have a prior that they will enjoy also the next one. It would take them several crap book to update.
Power laws are ubiquitous. This should be unsurprising—power laws are the simplest functional form in the logarithmic picture. If we use some sort of simplicity prior, we are guaranteed to find them. If we use first terms of Taylor expansion, we will find them. Log picture is as natural as the linear one. Someone should write a Meditation on Benford’s law—you have an asymptotically straight line in log-log picture of the probability than a number starts with some digits (in almost any real-life set of numerical values measured in units; you can see this must be the case because of invariance to unit scaling)
This is maybe worth emphasizing: nobody should be surprised to find power laws. Nobody should propose universal causal mechanism for power laws, it is as stupid as proposing one causal mechanism for straight lines in linear picture.
They are often the result of other power-law distributed quantities. To take one example from the op… initial distribution of masses for an initail population of new stars is a truncated power law. I don’t know why, but the proposed mechanisms for this is for example turbulent fragmentation of the initial cloud, where the power law can come from the power spectrum of super–sonic turbulence.
The post creates unnecessary confusion by lumping “momentum” , “exponential growth”, “compound interest”, and “heavy tail distributions”. Conflating these concepts together on system-1 level into some vague undifferentiated positive mess is likely harmful for to anyone aspiring to think about systems clearly.
Some of what seems to me to be good arguments against entering the field, depending on what you include as the field.
We may live in a world where AI safety is either easy, or almost impossible to solve. In such cases it may be better to work e.g. on global coordination or rationality of leaders
It may be the case the “near-term” issues with AI will transform the world in a profound way / are big enough to pose catastrophic risks, and given the shorter timelines, and better tractability, they are higher priority. (For example, you can imagine technological unemployment + addictive narrow AI aided VR environments + decay of shared epistemology leading to unraveling of society. Or narrow AI accelerating biorisk.)
It may be the case the useful work on reduction of AI risk requires very special talent / judgment calibrated in special ways / etc. and the many people who want to enter the field will mostly harm the field, because the people who should start working on it will be drowned out by the noise created by the large mass.
(Note: I do not endorse the arguments. Also they are not answering the part about worrying.)
I like your point about where most of the computation/lovecraftian monsters are located.
I’ll think about it more, but if I try to paraphrase it in my picture by a metaphor … we can imagine an organization with a workplace safety department. The safety regulations it is implementing are result of some large external computation. Also even the existence of the workplace safety department is in some sense result of the external system. But drawing boundaries is tricky.
I’m curious about how the communication channel between evolution and the brain looks like “on the link level”. It seems it is reasonably easy to select e.g. personality traits, some “hyperparameters” of the cognitive architecture, and similar. It is unclear to me if this can be enough to “select from complex strategies” or if it is necessary to transmit strategies in some more explicit form.
Some instantiations of the first problem (How to prevent “aligned” AIs from unintentionally corrupting human values?) seem to me to be some of the easily imaginable ways to existential risk—e.g. almost all people spending lives in an addictive VR. I’m not sure if it is really neglected?
The thing I’m trying to argue is complex and yes, it is something in the middle between the two options.
1. Predictive processing (in the “perception” direction) makes some brave predictions, which can be tested and match data/experience. My credence in predictive processing in a narrow sense: 0.95
2. Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle. Vague introspective evidence for active inference comes from an ability to do inner simulations. Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will “prove their models are right” even at the cost of the actions being actually harmful for them in some important sense. How it may match everyday experience: for example, here. My credence in active inference as a basic design mechanism: 0.6
3. So far, the description was broadly Bayesian/optimal/“unbounded”. Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models. My credence in boundedness being a key ingredient: 0.99
4. What is missing from the picture so far is some sort of “goals” or “motivation” (or in other view, a way how evolution can insert into the brain some signal). How Karl Friston deals with this, e.g.
We start with the premise that adaptive agents or pheno-types must occupy a limited repertoire of physical states. For a phenotype to exist, it must possess defining characteristics or traits; both in terms of its morphology and exchange with the environment. These traits essentially limit the agent to a bounded region in the space of all states it could be in. Once outside these bounds, it ceases to possess that trait (cf., a fish out of water).
is something which I find unsatisfactory. My credence in this being complete explanation: 0.1
5. My hypothesis is ca. this:
evolution inserts some “goal-directed” sub-parts into the PP/AI machinery
these sub-parts do not somehow “directly interface the world”, but are “burried” within the hierarchy of the generative layers; so they not care about people or objects or whatever, but about some abstract variables
they are quite “agenty”, optimizing some utility function
from the point of view of such sub-agent, other sub-agents inside of the same mind are possibly competitors; at least some sub-agents likely have access to enough computing power to not only “care about what they are intended to care about”, but do a basic modelling of other sub-agents; internal game theoretical mess ensues
6. This hypothesis bridges the framework of PP/AI and the world of theories viewing the mind as a multi agent system. Multi-agent theories of mind have some introspective support in various styles of psychotherapy, IFS, meditative experience, some rationality techniques. And also seem to be explain behavior where humans seem to “defect against themselves”. Credence: 0.8
(I guess a predictive processing purist would probably describe 5. & 6. as just a case of competing predictive models, not adding anything conceptually new.)
Now I would actually want to draw a graph how strongly 1...6. motivate different possible problems with alignment, and how these problems motivate various research questions. For example the question about understanding hierarchical modelling is interesting even if there is no multi-agency, scaling of sub-agents can be motivated even without active inference, etc.
I read the book the SSC article is reviewing (plus a bunch of articles on predictive-mind, some papers from google scholar + seen several talks). Linking the SSC review seemed more useful than linking amazon.
I don’t think I’m the right person for writing an introduction to predictive processing for the LW community.
Maybe I actually should have included a warning that the whole model I’m trying to describe has nontrivial inferential distance.
Thanks for the feedback! Sorry, I’m really bad at describing models in text—if it seems self-contradictory or confused, it’s probably either me being bad at explanations or inferential distance (you probably need to understand predictive processing better than what you get from reading the SSC article).
Another try… start by imagining the hierarchical generative layers (as in PP). They just model the world. Than, add active inference. Than, add the special sort of “priors” like “not being hungry” or “seek reproduction”. (You need to have those in active inference for the whole thing to describe humans IMO) Than, imagine that these “special priors” start to interact with each other …leading to a game-theoretic style mess. Now you have the sub-agents. Than, imagine some layers up in the hierarchy doing stuff like “personality/narrative generation”.
Unless you have this picture right, the rest does not make sense. From your comments I don’t think you have the picture right. I’ll try to reply … but I’m worried it may add to confusion.
To some extent, PP struggles to describe motivations. Predictive processing in a narrow sense is about perception, is not agenty at all—it just optimizes set of hierarchical models to minimize error. If you add active inference, the system becomes agenty, but you actually do have a problem with motivations . From some popular accounts or from some remarks by Friston it may seem otherwise, but “depends on details of the notion of free energy” is in my interpratation a statement roughly similar to a claim that physics can be stated in terms of variation principles, and the rest “depends on the notion of action”
Jeffrey-Bolker rotation is something different leading to somewhat similar problem (J-B rotation is much more limited in what can be transformed to what, and preserves decision structure)
My feeling is you don’t understand Friston; also I don’t want to defend pieces of Friston as I’m not sure I understand Friston.
Options given in the “what are we aligning with” is AFAIK not something which would have been described in this way before, so an attempt to map it directly to the “familiar litany of options” is likely not the way how to understand it. Overall my feeling is here you don’t have the proposed model right and the result is mostly confusion.
It’s nicely written, but the image of the Player, hyperintelligent Lovecraftian creature, seems not really right to me. In my picture, were you have this powerful agent entity, I see a mess of sub-agents, interacting in a game theoretical way primarily among themselves.* How “smart” the results of the interactions are, is quite high variance. Obviously the system has a lot of computing power, but that is not really the same as being intelligent or agent-like.
What I really like the descriptions how the results of these interaction are processed via some “personality generating” layers and how the result looks like “from within”.
(* one reason for why this should be the case is: there is not enough bandwidth between DNA and the neural network; evolution can input some sort of a signal like “there should be a subsystem tracking social status, and that variable should be maximized” or tune some parameters, but it likely does not have enough bandwidth to transfer some complex representation of the real evolutionary fitness. Hence what gets created are sub-agenty parts, which do not have direct access to reality, and often, instead of playing some masterful strategy in unison, are bargaining or even defecting internally)
Human brains likely model other humans by simulating them. The simple normative assumption used is something like humans are humans, which will not really help you in the way you want, but leads to this interesting problem
Learning from multiple agents.
Imagine a group of five closely interacting humans. Learning values just from person A may run into the problem that big part of A’s motivation is based on A simulating B,C,D,E (on the same “human” hardware, just incorporating individual differences). In that case, learning the “values” just from A’s actions could be in principle more difficult than observing the whole group, trying to learn some “human universals” and some “human specifics”. A different way of thinking about this could be by making a parallel with meta-learning algorithms (e.g. REPTILE) but in IRL frame.
I’m really delighted to hear that this seems like a very well developed model :) Actually I’m not aware of any published attempt to unite sub-agents with predictive processing framework in this l way even on the qualitative level, and it is possible this union is original (I did not found anything attempting to do this on google scholar or on few first pages of google search results)
Making it quantitative, end-to-end trainable on humans, does not seem to be feasible right now, in my opinion.
With the individual components
predictive processing is supported by a growing pile of experimental data
active inference is theoretically very elegant extension of predictive processing
sub-personalities is something which seems to work in psychotherapy, and agrees with some of my meditative experience
sub-agenty parts interacting in some game-theory-resembling way feels like something which can naturally develop within sufficiently complex predictive processing/active inference system
First scanned from paper (I like to draw), second edited in GIMP (I don’t like to draw the exact same thing repeatedly). Don’t know if it’s the same with other images you see on LW. Instead of scanning you can also draw using tablet
Nice! We should chat about that.
The technical research direction specification can be in all cases “expanded” from the “seed idea” described here. (We are already working on some of those.) I’m not sure if it’s the best thing to publish now—to me, it seems better to do some iterations on “specify—try to work on it” first, before publishing the expansions.
Good way, I would almost say, the right way, how to do bounded rationality is the information-theoretic bounded rationality. There is a post about it in the works...