Thanks too for responding. I hope our conversation will be productive.
A crucial notion that plays into many of your objections is the distinction between “inner intelligence” and “outer intelligence” of an object (terms derived from “inner vs. outer optimizer”). Inner intelligence is the intelligence the object has in itself as an agent, determined through its behavior in response to novel situation, and outer intelligence is the intelligence that it requires to create this object, and is determined through the ingenuity of its design. I understand your “AI hypothesis” to mean that any solution to the control problem must have inner intelligence. My response is claiming that while solving the control problem may require a lot of outer intelligence, I think it only a requires a small amount of inner intelligence. This is because it seems like the environment in Conway’s Game of Life with random dense initial conditions is very low variety and requires a small number of strategies to handle. (Although just as I’m open-minded about intelligent life somehow arising in this environment, it’s possible that there are patterns much frequent than abiogenesis that make the environment much more variegated.)
Matter and energy and also approximately homogeneously distributed in our own physical universe, yet building a small device that expands its influence over time and eventually rearranges the cosmos into a non-trivial pattern would seem to require something like an AI.
The universe is only homogeneous at the largest scales, at smaller scales it is highly inhomogeneities in highly diverse ways like stars and planets and raindrops. The value of our intelligence comes from being able to deal with the extreme diversity of intermediate-scale structures. Meanwhile, at the computationally tractable scale in CGOL, dense random initial conditions do not produce intermediate-scale structures between the random small-scale sparks and ashes and the homgeneous large-scale. That said, conditional on life being rare in the universe, I expect that the control problem for our universe requires lower-than-human inner intelligence.
You mention the difficulty of “building a small device that...”, but that is talking about outer intelligence. Your AI hypothesis states that, however such a device can or cannot be built, the device itself must be an AI. That’s where I disagree.
Now it could actually be that in our own physical universe it is also possible to build not-very-intelligent machines that begin small but eventually rearrange the cosmos. In this case I am personally more interested in the nature of these machines than in “intelligent machines”, because the reason I am interested in intelligence in the first place is due to its capacity to influence the future in a directed way, and if there are simpler avenues to influence in the future in a directed way then I’d rather spend my energy investigating those avenues than investigating AI. But I don’t think it’s possible to influence the future in a directed way in our own physical universe without being intelligent.
Again, the distinction between inner and outer intelligence is crucial. In a pure mathematical sense of existence there exist arrangements of matter that solve the control problem for our universe, but for that to be relevant for our future there has also has to be a natural process that creates these arrangements of matter at a non-negligible rate. If the arrangement requires a high outer intelligence then this process must be intelligent. (For this discussion, I’m considering natural selection to be a form of intelligent design.) So intelligence is still highly relevant for influencing the future. Machines that are mathematically possible cannot practically be created are not “simpler avenues to influence in the future”.
“to solve the control problem in an environment full of intelligence only requires marginally more intelligence at best”
What do you mean by this?
Sorry. I meant that the solution to the control problem need only be marginally more intelligent than the intelligent beings in its environment. The difference in intelligence between a controller in an intelligent environment and a controller in a unintelligent environment may be substantial. I realize the phrasing you quote is unclear.
In chess, one player can systematically beat another if the first is ~300 ELO rating points higher, but I’m considering that as a marginal difference in skill on the scale from zero-strategy to perfect play. If our environment is creating the equivalent of a 2000 ELO intelligence, and the solution to the control problem has 2300 ELO, then the specification of the environment contributed 2000 ELO of intelligence, and the specification of the control problem only contributed an extra 300 ELO. In other words, open-world control problems need not be an efficient way of specifying intelligence.
But if one entity reliably outcompetes another entity, then on what basis do you say that this other entity is the more intelligent one?
On the basis of distinguishing narrow intelligence from general intelligence. A solution to the control problem is guaranteed to outcompete other entities in force or manipulation, but it might be worse at most other tasks. The sort of thing I had in mind for “NP-hard problems in military strategy” would be “this particular pattern of gliders is particularly good at penetrating a defensive barrier, and the only way to find this pattern is through a brute force search”. Knowing this can the controller a decisive advantage at military conflicts without making it any better at any other tasks, and can permit the controller to have lower general intelligence while still dominating.
Thanks. I also found an invite link in a recent reddit post about this discussion (was that by you?).
While I appreciate the analogy between our real universe and simpler physics-like mathematical models like the game of life, assuming intelligence doesn’t arise elsewhere in your configuration, this control problem does not seem substantially different or more AI-like from any other engineering problems. After all, there are plenty of other problems that involve leveraging a narrow form of control on a predicable physical system to achieve a more refined control, ex. building a rocket that hits a specific target. The structure that arises from a randomly initialized pattern in Life should be homogeneous in a statistical sense a so highly predictable. I expect almost all of it should stabilize to debris of stable periodic patterns. It’s not clear whether it’s possible to manipulate or clear the debris in controlled ways, but if it is possible, then a single strategy will work for the entire grid. It may take a great deal of intelligence to come up with such a strategy, but once such a strategy is found it can be hard-coded into the initial Life pattern, without any need for an “inner optimizer”. The easiest-to-design solution may involve computer-like patterns, with the pattern keeping track of state involved in debris-clearing and each part tracking its location to determine its role in making the final smiley pattern, but I don’t see any need for any AI-like patterns beyond that. On the other hand, if there are inherent limits in the ability to manipulate debris then no amount of reflection by our starting pattern is going to fix that.
That is assuming intelligence doesn’t arise in the random starting pattern. If it does, our starting configuration would to overpower every other intelligence that arises and tries to control the space, and this would reasonably require it to be intelligent itself. But if this is the case then the evolution of the random pattern already encodes the concept of intelligence in a much simpler way then this control problem. To predict the structures that would arise from a random initial configuration the idea of intelligence would naturalistic come up. Meanwhile, to solve the control problem in an environment full of intelligence only requires marginally more intelligence at best, and compared to the no-control prediction problem the control problem adds off some complexity for not very much increase in intelligence. Indeed, the solution to the control problem may even be less intelligent than the structures it competes against, and make up for that with hard-coded solutions to NP-hard problems in military strategy.
On a different note, I’m flattered to see a reference in the comments to some of my own thoughts on working through debris in the Game of Life. It was surprising to see interest in that resurge, and especially surprising to see that interest come from people in AI alignment.
Thanks for linking to my post! I checked the other link, on Discord, and for some reason it’s not working.
Do you know of any source that gives the same explanations in text instead of video?
Edit: Never mind, the course has links to “Lecture PDF” that seem to summarize them. For the first lecture the summary is undetailed and I couldn’t make sense of it without watching the videos, but they appear to get more detailed later on.
I don’t like the fact that the preview doesn’t disappear when I stop hovering. I find the preview visually jarring enough that I would prefer to spend most of my reading time without a spurious preview window. At the very least, there should be a way to manually close the preview. Otherwise I would want to avoid hovering over any links and to refresh when I do, which is a bad reading experience.
My main point of disagreement is the way you characterize these judgements as feelings. With minor quibbles I agree with your paragraph after substituting “it feels” with “I think”. In your article you distinguish between abstract intellectual understanding which may believe that there is no self in some sense and some sort of lower-level perception of the self which has a much harder time accepting this; I don’t follow what you’re pointing to in the latter.
To be clear, I do acknowledge to experience mental phenomena that are about myself in some sense, such as a proprioceptive distinction between my body and other objects in my mental spatial model, an introspective ability to track my thoughts and feelings, and a sense of the role I play in my community that I am expected to adhere to. However, the form of these pieces of mental content is wildly different, and it is only through an abstract mental categorization that I recognize them as all about the same thing. Moreover, I believe these senses are imperfect but broadly accurate, so I don’t know what it is that you’re saying is an illusion.
Crossposted on my blog:
Lightspeed delays lead to multiple technological singularities.
By Yudkowsky’s classification, I’m assuming the Accelerating Change Singularity: As technology gets better, the characteristic timescale at which technological progress is made becomes shorter, so that the time until this reaches physical limits is short from the perspective of our timescale. At a short enough timescale the lightspeed limit becomes important: When information cannot traverse the diameter of civilization in the time until singularity further progress must be made independently in different regions. The subjective time from then may still be large, and without communication the different regions can develop different interest and, after their singularities, compete. As the characteristic timescale becomes shorter the independent regions split further.
I’m still not sure what you mean by the feeling of having a self. Your exercise of being aware of looking at an object reminds of the bouba/kiki effect: The words “bouba” and “kiki” are meaningless but you ask people to label which shapes are bouba and which are kiki in spite of that. The fact they answer does mean they deep down believe that “bouba” and “kiki” are real words. In the same way, when you ask me being aware of being someone looking at an object, I may have a response—observing that the proposition “I am looking at my phone” is true, contemplating the simpleminded self-evidence of this fact, thinking about how this relates to the points Kaj is trying to make—and there may even be some regularities in this response I can’t rationally justify. Nonetheless this response is not a feeling of a self, nor is it something I am mistakenly confusing with a self—any conflation is only being made from my attempt to interpret an unclear instruction, and is not a mistake I would make in regular thought.
A related point is that the word “self” is so rarely used in ordinary language. The suffix “-self”, like “myself” or “yourself”, yes, but not “self”. That’s only said when people are doing philosophy.
This map is not a surjection because not every map from the rational numbers to the real numbers is continuous, and so not every sequence represents a continuous function. It is injective, and so it shows that a basis for the latter space is at least as large in cardinality as a basis for the former space. One can construct an injective map in the other direction, showing the both spaces of bases with the same cardinality, and so they are isomorphic.
This may be relevant:
Imagine a computational task that breaks up into solving many instances of problems A and B. Each instance reduces to at most n instances of problem A and at most m instances of problem B. However, these two maxima are never achieved both at once: The sum of the number of instances of A and instances of B is bounded above by some r<n+m. One way to compute this with a circuit is to include n copies of a circuit C0 for computing problem A and m copies of a circuit C1 for computing problem B. Another approach for solving the task is to include r copies of a circuit C2 which, with suitable control inputs, can compute either problem A or problem B. Although this approach requires more complicated control circuitry, if r is significantly less than n+m and the size of C2 is significantly less than the sum of the sizes of C0 and C1 (which may occur if problems A and B have common subproblems X and Y which can use a shared circuit) then this approach will use less logic gates overall.
More generally, consider some complex computational task that breaks down into a heterogeneous set of subproblems which are distributed in different ways depending on the exact instance. Analogous reasoning suggests that the minimal circuit for solving this task will involve a structure akin to emulating a CPU: There are many instances of optimized circuits for low-level tasks, connected by a complex dependency graph. In any particular instance of the problem the relevant data dependencies are only a small subgraph of this graph, with connections decided by some control circuitry. A particular low-level circuit need not have a fixed purpose, but is used in different ways in different instances.
So, our circuit has a dependency tree of low-level tasks optimized for solving our problem in the worst-case. Now, at a starting stage of this hierarchy it has to process information about how a particular instance is separated into subproblems and generate the control information for solving this particular instance. The control information might need to be recomputed as new information about the structure of the instance are made manifest, and sometimes a part of the circuit may perform this recomputation without full access to potentially conflicting control information calculated in other parts.
Yes, this is the refutation for Pascal’s mugger that I believe in, although I never got around to writing it up like you did. However, I disagree with you that it implies that our utilities must be bounded. All the argument shows is that ordinary people never assign to events enourmous utility values with also assigning them commensuably low probabilities. That is, normative claims (i.e., claims that certain events have certain utility assigned to them) are judged fundamentally differently from factual claims, and require more evidence than merely the complexity prior. In a moral intuitionist framework this is the fact that anyone can say that 3^^^3 lives are suffering, but it would take living 3^^^3 years and getting to know 3^^^3 people personally to feel the 3^^^3 times utility associated with this events.
I don’t know how to distinguish the scenarios where our utilities are bounded and where our utilities are unbounded but regularized (or whether our utilities are suffiently well-defined to distinguish the two). Still, I want to emphasize that the latter situation is possible.
Quick thought: I think you are relying too much on your own experience which I don’t expect to generalize well. Different people will have different habits on how much thought they put to their comments, and I expect some put too much thought and some too. We should put more effort at identifying the aggregate tendencies of people at this forum before we make reccomendations.
Then again, perhaps you are just offering the idea casually, so it’s okay. Still I worry that the most likely future pathways for posts like this are “get ignored” and “get cited uncritically”, and there’s no clear place for this more thorough investigation.
What’s the fallacy you’re claiming?
First, to be clear, I am referring to things such as this description of the prisoner’s dilemma and EY’s claim that TDT endorses cooperation. The published material has been careful to only say that these decision theories endorse cooperation among identical copies running the same source code, but as far as I can tell some researchers at MIRI still believe this stronger claim and this claim has been a major part of the public perception of these decision theories (example here; see section II).
The problem is that when two FDT agent with a different utility functions and different prior knowledge are facing a prisoner’s dilemma with each other, then their decisions are actually two different logical variables X0 and X1. The argument for cooperating is that X0 and X1 are sufficiently similar to one another that in the counterfactual where X0=C we also have X1=C. However, you could just as easily take the opposite premise, where X0 and X1 are sufficiently dissimilar that counterfactually changing X0 will have no effect on X1. Then you are left with the usual CDT analysis of the game. Given the vagueness of logical counterfactuals it is impossible to distinguish these two situations.
Here’s a related question: What does FDT say about the centipede game? There’s no symmetry between the players so I can’t just plug in the formalism. I don’t see how you can give an answer that’s in the spirit of cooperating in the prisoner’s dilemma without reaching the conclusion that FDT involves altruism among all FDT agents through some kind of veil of ignorance argument. And taking that conclusion is counter to the affine-transformation-invariance of utility functions.
Some meta-level comments and questions:
This discussion has moved far off-topic away from EY’s general rationality lessons. I’m pleased with this, since these are topics that I want to discuss, but I want to mention this explicitly since constant topic-changes can be bad for a productive discussion by preventing the participants from going into any depth. In addition, lurkers might be annoyed at reading yet another AI argument. Do you think we should move the discussion to a different venue?
My motivations for discussing this are a chance to talk about critisms of MIRI that I haven’t gotten down in writing in detail before, a chance to get a rough impression on how MIRI supporters to these explanations, and more generally an opportunity to practice intellectual honest debates. I don’t expect the discussion to go on far enough to resolve our disagreements, but I am trying to anyways to get practice. I’m currently enthusiastic about continuing the discussion. but the sort of enthusiasm that could easily wane in a day. What is your motivation?
“but a fundamental assumption behind TDT and UDT is the existence of a causal structure behind logical statements, which sounds implausible to me.”
None of the theories mentioned make any assumption like that; see the FDT paper above.
Page 14 of the FDT paper:
Instead of a do operator, FDT needs a true operator, which takes a logical sentence φ and updates P to represent the scenario where φ is true...
...Equation (4) works given a graph that accurately describes how changing the value of a logical variable affects other variables, but it is not yet clear how to construct such a thing—nor even whether it can be done in a satisfactory manner within Pearl’s framework.
This seems wrong, if you’re saying that we can’t formally establish the behavior of different decision theories, or that applying theories to different cases requires ad-hoc emendations; see section 5 of “Functional Decision Theory” (and subsequent sections) for a comparison and step-by-step walkthrough of procedures for FDT, CDT, and EDT. One of the advantages we claim for FDT over CDT and EDT is that it doesn’t require ad-hoc tailoring for different dilemmas (e.g., ad-hoc precommitment methods or ratification procedures, or modifications to the agent’s prior).
The main thing that distinguishes FDT from CDT is how the true operator mentioned above functions. As far as I’m aware this is always inserted by hand. This is easy to for situations where entities make perfect simulations of one another, but there aren’t even rough guidelines for what to do when the computations that are done cannot be delineated in such a clean manner. In addition, if this was a rich research field I would expect more “math that bites back”, i.e., substantive results that reduce to clearly-defined mathematical problems whose result wasn’t expected during the formalization.
This point about “load-bearing elements” is at its root an intuitive judgement that might be difficult for me to convey properly.
Thinking further, I’ve spotted something that may a crucial misunderstanding. Is the issue whether EY was right to create his own technical research institute on AI risk, is it whether he was right to pursue AI risk at all? I agree that before EY there was relatively little academic work on AI risk, and that he played an important role in increasing the amount of attention the issue recieves. I think it would have been a mistake for him to ignore the issue on the basis that the experts must know better than him and they aren’t worried.
On the other hand, I expect an equally well-funded and well-staffed group that is mostly within academia to do a better job than MIRI. I think EY was wrong in believing that he could create an institute that is better at pursuing long-term technical research in a particular topic than academia.
When I think about the people working on AGI outcomes within academia these days, I think of people like Robin Hanson, Nick Bostrom, Stuart Russell, and Eric Drexler, and it’s not immediately obvious to me that these people have converged more with each other than any of them have with researchers at MIRI.
I see the lack of convergence between people in academia as supporting my position, since I am claiming that MIRI is looking too narrowly. I think AI risk research is still in a brainstorming stage where we still don’t have a good grasp on what all the possibilities are. If all of these people have rather different ideas for how to go about it, was is it just the approaches that Eliezer Yudkowsky likes that are getting all the funding?
I also have specific objections. Let’s take TDT and FDT as an example since they were mentioned in the post. The primary motivation for them is that they handle Newcombe-like dilemmas better. I don’t think Newcombe-like dilemmas are relevant for the reasoning of potentially dangerous AIs, and I don’t think you will get a good holistic understanding of what a good reasoner out of these theories. One secondary motivation for TDT/UDT/FDT is a fallacious argument that it endorses cooperation in the true prisoner’s dilemma. Informal arguments seem to be the load-bearing applying these theories to any particular problem; the technical works seem to be mainly formalizing narrow instances of these theories to agree with the informal intuition. I don’t know about FDT, but a fundamental assumption behind TDT and UDT is the existence of a causal structure behind logical statements, which sounds implausible to me.
(Background: I used to be skeptical about AI risk as a high-value cause, now I am uncertain, and I am still skeptical of MIRI.)
I disagree with you about MIRI compared with mainstream academia. Academics may complain about the way academia discourages “long-term substantive research projects”, but taking a broader perspective academia is still the best thing there is for such projects. I think you misconstrued comments by academics complaining about their situation on the margin as being statements about academia in the absolute, and thereby got the wrong idea about the relative difficulty of doing good research within and outside academia.
When you compete for grant funding, that means your work is judged by people with roughly the same level of expertise as you. When you make a publically-funded research institute your work is judged for more shallowly. That you chose to go along the second path rather than the first path had left a bad first impression on me when I first learned of it, like you can’t make a convincing case in a fair test. As MIRI grew and as I learned more about it, I got the impression that since MIRI is a small team too little contact with a broader intellectual community it prematurely reached a consensus on a particular set of approaches and assumptions that I think are likely to go nowhere.