Reasons for some careful optimism
in Part I., it can be the case that human values are actually complex combination of easy to measure goals + complex world models, so the structure of the proxies will be able to represent what we really care about. (I don’t know. Also the result can still stop represent our values with further scaling and evolution.)
in Part II., it can be the case that influence-seeking patterns are more computationally costly than straightforward patterns, and they can be in part suppressed by optimising for processing costs, bounded-rationality style. To some extend, influence-seeking patterns attempting to grow and control the whole system seems to me to be something happening also within our own minds. I would guess some combinational of immune system + metacognition + bounded rationality + stabilisation by complexity is stabilising many human minds. (I don’t know if anything of that can scale arbitrarily.)
Short summary of how is the lined paper important: you can think about bias as some sort of perturbation. You are then interested in the “cascade of spreading” of the perturbation, and especially factors like the distribution of sizes of cascades. The universality classes tell you this can be predicted by just a few parameters (Table 1 in the linked paper) depending mainly on local dynamic (forecaster-forecaster interactions). Now if you have a good model of the local dynamic, you can determine the parameters and determine into which universality class the problem belongs. Also you can try to infer the dynamics if you have good data on your interactions.
I’m afraid I don’t know enough about how “forecasting communities” work to be able to give you some good guesses what may be the points of leverage. One quick idea, if you have everybody on the same platform, may be to do some sort fo A/B experiment—manipulate the data so some forecasters would see the predictions of other with an artificially introduced perturbation, and see how their output will be different from the control group. If you have data on “individual dynamics” liken that, and some knowledge of network structure, the theory can help you predict the cascade size distribution.
(I also apologize for not being more helpful, but I really don’t have time to work on this for you.)
I was a bit confused by we but aren’t sure how to reason quantitatively about the impacts, and how much the LW community could together build on top of our preliminary search, which seemed to nudge toward original research. Outsourcing literature reviews, distillation or extrapolation seem great.
Generally, there is a substantial literature on the topic within the field of network science. The right keywords for Google scholar are something like spreading dynamics in complex networks. Information cascades does not seem to be the best choice of keywords.
There are many options how you can model the state of the node (discrete states, oscillators, continuous variables, vectors of anything of the above,...), multiple options how you may represent the dynamics (something like Ising model / softmax, versions of voter model, oscillator coupling, …) and multiple options how you model the topology (graphs with weighted or unweighted edges, adaptive wiring or not, topologies based on SBM, or scale-free networks, or Erdős–Rényi, or Watts-Strogatz, or real-world network data,… This creates somewhat large space of options, which were usually already explored somewhere in the literature.
What is possibly the single most important thing to know about this, there are universality classes of systems which exhibit similar behaviour; so you can often ignore the details of the dynamics/topology/state representation.
Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)
It would be cool to try some style-matching between the text and images. Ultimately, having some “personality vector” which would be used both in image and text generation. (A very crude version could be to create a NN translator from the style space to word2vec space and include the words in the GPT prompts)
As I see it, big part of the problem is there is an inherent tension between “concrete outcomes avoiding general concerns with human models” and “how systems interacting with humans must work”. I would expect the more you want to avoid general concerns with human models, the more “impractical” suggestions you get—or in another words, that the tension between the “Problems with h.m.” and “Difficulties without h.m.” is a tradeoff you cannot avoid by conceptualisations.
I would suggest using grounding in QFT not as an example of obviously wrong conceptualisation, but as useful benchmark of “actually human-model-free”. Comparison to the benchmark may then serve as a heuristic pointing to where (at least implicit) human modelling creeps in. In the above mentioned example of avoiding side-effects, the way how the “coarse-graining” of the space is done is actually a point where Goodharting may happen, and thinking in that direction can maybe even lead to some intuitions about how much info about humans got in.
One possible counterargument to the conclusion of the o.p. is that the main “tuneable” parameters we are dealing with are 1. “modelling humans explicitly vs modelling humans implicitly”, and II. “total amount of human modelling”. Then, it is possible, competitive systems are only in some part of this space. And by pushing hard on the “total amount of human modelling” parameter we can get systems which are doing less human modelling, but when they do it, it is happening mostly in implicit, hard to understand ways.
I’m afraid it is generally infeasible to avoid modelling humans at least implicitly. One reason for that is that basically any practical ontology we use is implicitly human. In a sense the only implicitly non-human knowledge is quantum field theory (and even that is not clear).
For example: while human-independent methods to measure negative side effects seem like human-independent, it seems to me lot of ideas about humans creep into the details. The proposals I’ve seen generally depend on some coarse-graining of states - you at least want to somehow remove time from the state, but generally you do coarse-graining based on …actually, what humans value. (If this research agenda would be trying to avoid implicit human models, I would expect people spending a lot of effort on measures of quantum entaglement, decoherence, and similar topics.)
Just a few comments
In the abstract, one open problem about “not-goal directed agents” is “when they turn into goal directed?“; this seems to be similar to the problem of inner optimizers, at least in the direction that solutions which would prevent the emergence of inner optimizers could likely work for non-goal directed things
From the “alternative solutions”, in my view, what is under-investigated are attempts to limit capabilities—make “bounded agents”. One intuition behind it is that humans are functional just because goals and utilities are “broken” in a way compatible with our planning and computational bounds. I’m worried that efforts in this direction got bucketed with “boxing”, and boxing got some vibe as being uncool. (By making something bounded I mean for example making bit-flips costly in a way which is tied to physics, not naive solutions like “just don’t connect it to the internet”)
I’m particularly happy about your points on the standard claims about expected utility maximization. My vague impression is too many people on LW kind of read the standard texts, take note that there is a persuasive text from Eliezer on a topic, and take the matter as settled.
Not only it is hard to disentangle manipulation and explanation; it is actually difficult to disentangle even manipulation and just asking the human about preferences (like here).
Manipulation via incorrect “understanding” is IMO somewhat easier problem (understanding can be possibly tested by something like simulating the human’s capacity to predict). Manipulation via messing up with our internal multi-agent system of values seems subtle and harder. (You can imagine AI roughly in the shape of Robin Hanson, explaining to one part of the mind how some of the other parts work. Or just drawing the attention of consciousness to some sub-agents and not others.)
My impression is that in full generality it is unsolvable, but something like starting with an imprecise model of approval / utility function learned via ambitious value learning and restricting explanations/questions/manipulation by that may be work.
One hypothesis why we do so well: we “simulate” other people on a very similar hardware, and relatively similar mind (when compared to the abstract set of planners). Which is a sort of strong implicit prior. (Some evidence for that is we have much more trouble inferring goals of other people if their brains function far away from what’s usual on some dimension)
As Raemon noted, mentorship bottleneck is actually a bottleneck. Senior researchers who should mentor are the most bottlenecked resource in the field, and the problem is unlikely to be solved by financial or similar incentives. Motivating too much is probably wrong, because mentoring competes with time to do research, evaluate grants, etc. What can be done is
improve the utilization of time of the mentors (e.g. mentoring teams of people instead of individuals)
do what can be done on peer-to-peer basis
use mentors from other fields to teach people generic skills, e.g. how to do research
prepare better materials for onboarding
Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? In my opinion for example AI safety camps were significantly more effective. I have maybe 2-3 ideas which would be likely more effective (sorry but shareable in private only).
Btw, when it comes to any practical implications, both of these repugnant conclusions depend on likely incorrect aggregating of utilities. If we aggregate utilities with logarithms/exponentiation in the right places, and assume the resources are limited, the answer to the question “what is the best population given the limited resources” is not repugnant.
This is part of the problem I was trying to describe in multi-agent minds, part “what are we aligning the AI with”.
I agree the goal is under-specified. With regard to meta-preferences, with some simplification it seems we have several basic possibilities
1. Align with the result of the internal aggregation (e.g. observe what does the corporation do)
2. Align with the result of the internal aggregation, by asking (e.g. ask the corporation via some official channel, let the sub-agents sort it out inside)
3. Learn about the sub-agents and try to incorporate their values (e.g. learn about the humans in the corporation)
4. Add layers of indirection, e.g. asking about meta-preferences
Unfortunately, I can imagine in case of humans, 4. can lead to various stable reflective equilibria of preferences and meta-preferences—for example, I can imagine, by suitable queries, you can get a human to want
to be aligned with explicit reasoning, putting most value on some conscious, model-based part of the mind; with meta-reasoning about VNM axioms, etc.
to be aligned with some heart&soul, putting value on universal love, transcendent joy, and the many parts of human mind which are not explicit, etc.
where both of these options would be self-consistently aligned with the meta-preferences the human will be expressing about how the sub-agent alignment should be done.
So even with meta-preferences, likely there are multiple ways
There is a fascinating not yet really explored territory between the GWT and predictive processing.
For example how it may look: there is a paper on Dynamic interactions between top-down expectations and conscious from 2018, where they do experiments in the “blink of mind” style and prediction, and discover, for example
The first question that we addressed was how prior information about the identity of an upcoming stimulus influences the likelihood of that stimulus entering conscious awareness. Using a novel attentional blink paradigm in which the identity of T1 cued the likelihood of the identity of T2, we showed that stimuli that confirm our expectation have a higher likelihood of gaining access to conscious awareness
Second, nonconscious violations of conscious expectations are registered in the human brain Third, however, expectations need to be implemented consciously to subsequently modulate conscious access. These results suggest a differential role of conscious awareness in the hierarchy of predictive processing, in which the active implementation of top-down expectations requires conscious awareness, whereas a conscious expectation and a nonconscious stimulus can interact to generate prediction errors. How these nonconscious prediction errors are used for updating future behavior and shaping trial-by-trial learning is a matter for future experimentation.
My rough takeaway is this: while on surface it may seem that effect of unconscious processing and decision-making is relatively weak, the unconscious processing is responsible for what even gets the conscious awareness. In the FBI metaphor, there is a lot of power in the FBI’s ability to shape what even get’s on the agenda.
The second thing first: “...but before they were physics terms they were concepts for intuitive things” is actually not true in this case: momentum did not mean anything, before being coined in physics. Than, it become used in a metaphorical way, but mostly congruently with the original physics concepts, as something like “mass”x”velocity”. It seems to me easy to imagine vivid pictures based of this metaphor, like advancing army conquering mile after mile of enemy territory having a momentum, or a scholar going through page after page of a difficult text. However, this concept is not tied to the b∗x term (which is one of my cruxes).
To me, the original metaphorical meaning of momentum makes a lot of sense: you have a lot of systems where you have something like mass (closely connected to inertia: you need great force to get something massive to move) and something like velocity—direction and speed where the system is heading. I would expect most people have this on some level.
Now, to the first thing second: I agree that it may be useful to notice all the systems in which the Taylor series for f has b>0, ESPECIALLY when it’s comparably easy to control f via b∗x rather than just a. However, some of the examples in the original post do not match this pattern: some could be just systems where, for example, you insert heavy-tailed distribution on the input, and you get heavy-tailed distribution on the output, or systems where the a term is what you should control, or systems where you should actually understand more about f(x) than the fact that is is has positive first derivative at some point.
What should be a good name for b∗x>0 I don’t know, some random prosaic ideas are snowballing, compound, faenus (from latin interest on money, gains, profit, advantage), compound interest. But likely there are is some more poetic name, similarly to Moloch.
1. Going through two of the adjacent links in the same paragraph:
With the trees, I only skimmed it, but if I get it correctly, the linked article proposes this new hypothesis: Together these pieces of evidence point to a new hypothesis: Small-scale, gap-generating disturbances maintain power-function size structure whereas later-successional forest patches are responsible for deviations in the high tail.
and, also from the paper
Current theories explaining the consistency of tropical forest size structure are controversial. Explanations based on scaling up individual metabolic rates are criticized for ignoring the importance of asymmetric competition for light in causing variation in dynamic rates. Other theories, which embrace competition and scale individual tree vital rates through an assumption of demographic equilibrium, are criticized for lacking parsimony, because predictions rely on site-level, size-specific parameterization
(I also recommend looking on the plots with the “power law”, which are of the usual type of approximating something more complex with a straight line in some interval.)
So, what we actually have in this: apparently different researchers proposing different hypothesis to explain the observed power-law-like data. It is far from conclusive what the actual reason is. As something like positive feedback loops is quite obvious part of the hypothesis space if you see power-law-like data, you are almost guaranteed to find a paper which proposes something in that direction. However, note that article actually criticizes previous explanations based more on “Matthews effect”, and proposes disturbances as a critical part of the explanation.
(Btw I do not claim any dishonesty from the author anything like that.)
Something similar can be said about the Cambrian explosion which is the next link.
Halo and Horn effects are likely evolutionary adaptive effects, tracking something real (traits like “having an ugly face” and “having higher probability of ending up in trouble” are likely correlated—the common cause can be mutation load / parasite load; you have things like the positive manifold).
And so on.
Sorry but I will not dissect every paragraph of the article in this way. (Also it seems a bit futile, as if I dig into specific examples, it will be interpreted as nit-picking)
2. Last attempt to gesture toward whats wrong with this whole. The best approximation of the cluster of phenomena the article is pointing toward is not “preferential attachment” (as you propose), but something broader—“systems with feedback loops which can be in some approximation described by the differential equation dx = b.x”.
You can start to see systems like that everywhere, and get a sense of something deep, explaining life, universe and everything.
One problem with this: if you have a system described by a differential equation of the form “dx = f(x,..)”, and the function f() is reasonable, you can approximate it by its Taylor series “f(x)=a+b.x+c.x.x+..“. Obviously, the first order term is b.x. Unfortunately (?) you can say this even before looking on the system.
So, vaguely speaking, when you start thinking in this way, my intuition is it puts you in a big danger of conflating something about how you do approximations with causal explanations. (I guess it may be a good deal for many people who don’t have s-1 intuitions for Taylor series or even log() function)
I’m still confused what you mean by momentum-like effects. Momentum is a very beautiful and crisp concept - the dual (canonical conjugate) of position, with all kinds of deep connections to everything. You can view the whole universe in the dual momentum space.
If the intention is to have a concept ca in the shape of “all kinds of dynamics which can be rounded to dx=a.x” I agree it may be valuable to have a word for that, but why overload momentum?
You asked for an example of where it conflates that causal mechanism with something else. I picked one example from this paragraph
There’s also the height of trees, the colour, brightness, and lifetime of stars, the proliferation of species, the halo and horns effect, affective death spirals, and the existence of life itself.
So, as I understand it, I gave you an example (the distribution of star masses) which quite likely does not have any useful connection to preferential attachment or exponential grow. I’m really confused after your last reply what is the state of our disagreement on this.
I’m actually scared to change the topic of the discussion what simplicity means, but the argument is roughly this: if you have arbitrary well behaved function, in the linear picture, you can approximate it locally by a straight line (the first term in the Taylor series, etc.). And yes, you get better approximation by including more terms from the Taylor series expansion, or by non-linear regression, etc. Now, if you translate this to the log-log picture, you will find out that power law is in some sense the simplest local approximation of anything. This is also the reason why people often mistakenly use power laws instead of lognormal and other distributions—if you truncate the lognormal and look just on part of the tail you can fit it with power law. Btw you nicely demonstrate this effect yourself—preferential attachment often actually leads to Yule-Simon distribution, and not a power law … but as usually you can approximate it.
I don’t know what you mean by attachment style, but some examples of the conflation...
Momentum is this: even if JK Rowling’s next book is total crap, it will still sell a lot of copies. Because people have beliefs, and because they enjoyed her previous books, they have a prior that they will enjoy also the next one. It would take them several crap book to update.
Power laws are ubiquitous. This should be unsurprising—power laws are the simplest functional form in the logarithmic picture. If we use some sort of simplicity prior, we are guaranteed to find them. If we use first terms of Taylor expansion, we will find them. Log picture is as natural as the linear one. Someone should write a Meditation on Benford’s law—you have an asymptotically straight line in log-log picture of the probability than a number starts with some digits (in almost any real-life set of numerical values measured in units; you can see this must be the case because of invariance to unit scaling)
This is maybe worth emphasizing: nobody should be surprised to find power laws. Nobody should propose universal causal mechanism for power laws, it is as stupid as proposing one causal mechanism for straight lines in linear picture.
They are often the result of other power-law distributed quantities. To take one example from the op… initial distribution of masses for an initail population of new stars is a truncated power law. I don’t know why, but the proposed mechanisms for this is for example turbulent fragmentation of the initial cloud, where the power law can come from the power spectrum of super–sonic turbulence.
The post creates unnecessary confusion by lumping “momentum” , “exponential growth”, “compound interest”, and “heavy tail distributions”. Conflating these concepts together on system-1 level into some vague undifferentiated positive mess is likely harmful for to anyone aspiring to think about systems clearly.