The general philosophy is deconfusion. Logical counterfactuals show up in several relevant looking places, like functional decision theory. It seems that a formal model of logical counterfactuals would let more properties of these algorithms be proved. There is an important step in going from an intuitive fealing of uncertainty, into a formalized theory of probability. It might also suggest other techniques based on it. I am not sure what you mean by logical counterfactuals being part of the map? Are you saying that they are something an algorithm might use to understand the world, not features of the world itself, like probabilities?
Using this, I think that self understanding, two boxing embedded FDT agents can be fully formally understood, in a universe that contains the right type of hyper-computation.
Here is a description of how it could work for peano arithmatic, other proof systens are similar.
First I define an expression to consist of a number, a variable, or a function of several other expressions.
Fixed expressions are ones in which any variables are associated with some function.
eg (3×infx((x×(x+5))+2)) is a valid fixed expression. But (y+4)×3 isn’t fixed.
Semantically, all fixed expressions have a meaning. Syntactically, local manipulations on the parse tree can turn one expression into another. eg (a+b)×c going to a×b+a×c for arbitrary expressions a,b,c.
I think that with some set of basic functions and manipulations, this system can be as powerful as PA.
I now have an infinite network with all fixed expressions as nodes, and basic transformations as edges. eg the associativity transform links the nodes (3+4)+5 and 3+(4+5).
These graphs form connected components for each number, as well as components that are not evaluatable using the rules. (there is a path from (3+4) to 7. There is not a path from 3+4 to 9. ) now
You now define a spread as an infinite positive sequence that sums to 1. (this is kind of like a probability distribution over numbers.) If you were doing counterfactual ZFC, it would be a function from sets to reals.
Each node is assigned a spread. This spread represents how much the expression is considered to have each value in a counterfactual.
Assign the node (3) a spread that assigns 1.0 to 3 and 0.0 to the rest. (even in a logical counterfactual, 3 is definitely 3). Assign all other fixed expressions a spread that is the weighted (smaller expressions are more heavy) average of its neighbours. (the spreads of the nodes it shares an edge with). To take the counterfactual of A is B, for A and B expressions with the same free variables, merge any node which has A as a subexpression, with the version that has B as a subexpression and solve for the spreads.
I know this is rough, Im still working on it.
Hi, I also have a reasonable understanding of various relevant math and AI theory. I expect to have plenty of free time after 11 June (Finals). So if you want to work with me on something, I’m interested. I’ve got some interesting ideas relating to self validating proof systems and logical counterfactuals, but not complete yet.
Lisp used to be a very popular language for AI programming. Not because it had features that were specific to AI, but because it was general. Lisp was based on more abstract abstractions, making it easy to choose whichever special cases were most useful to you. Lisp is also more mathematical than most programming languages.
A programming language that lets you define your own functions is more powerful than one that just gives you a fixed list of predefined functions. In a world where no programming language let you define your own functions, and a special purpose chess language has predefined chess functions. Trying to predefine AI related functions to make an “AI programming language” would be hard because you wouldn’t know what to write. Noticing that on many new kinds of software project, being able to define your own functions might be useful, I would consider useful.
The goal isn’t a language specialized to AI, its one that can easily be specialized in that direction. A language closer to “executable mathematics”.
I agree that if the AI is just big neural nets, python (or several other languages) are fine.
This language is designed for writing AI’s that search for proofs about their own behavior, or about the behavior of arbitrary pieces of code.
This is something that you “can” do in any programming language, but this one is designed to make it easy.
We don’t know for sure what AI’s will look like, but we can guess enough to make a language that might well be useful.
It would be ruinously costly to send over a large colonization fleet, and is much more efficient to send over a small payload which builds what is required in situ, i.e. von Neumann probes.
I would disagree on large colonization fleets being ruinously expensive, the best case scenario for large colonization fleets is if we have direct mass to energy conversion, launching say 2 probes from each star system that you spread from. Each probe would use half the mass energy of the star. Converting a quater of its mass to energy to get ~0.5c
You can colonize the universe even if you insist on never going to a new star system without bringing a star with you. (Some optimistic but not clearly false assumptions)
Agenty AI’s can be well defined mathematically. We have enough understanding of what an agent is that we can start dreaming up failure modes. Most of what we have for tool ASI is analogies to systems to stupid fail catastrophically anyway, and pleasant imaginings.
Some possible programs will be tool ASI’s, much as some programs will be agent ASI’s. The question is, what are the relative difficulties in humans building, and benefits of, each kind of AI. Conditional on friendly AI, I would consider it more likely to be an agent than a tool, with a lot of probability on “neither”, “both” and “that question isn’t mathematically well defined”. I wouldn’t be surprised if tool AI and corrigible AI turned out to be the same thing or something.
There have been attempts to define tool-like behavior, and they have produced interesting new failure modes. We don’t have the tool AI version of AIXI yet, so its hard to say much about tool AI.
If you think that there is 51% chance that A is the correct morality, and 49% chance that B is, with no more information available, which is best.
Optimize A only.
Flip a quantum coin, Optimize A in one universe, B in another.
Optimize for a mixture of A and B within the same Universe. (Act like you had utility U=0.51A+0.49B) (I would do this one.)
If A and B are local objects (eg paperclips, staples) then flipping a quantum coin makes sense if you have a concave utility per object in both of them. If your utility is log(#Paperclips across multiverse)+log(#Staples across multiverse) Then if you are the only potential source of staples or paperclips in the entire quantum multiverse, then the quantum coin or classical mix approaches are equally good. (Assuming that the resource to paperclip conversion rate is uniform. )
However, the assumption that the multiverse contains no other paperclips is probably false. Such an AI will run simulations to see which is rarer in the multiverse, and then make only that.
The talk about avoiding risk rather than expected utility maximization, and how your utility function is nonlinear, suggests this is a hackish attempt to avoid bad outcomes more strongly.
While this isn’t a bad attempt at decision theory, I wouldn’t want to turn on an ASI that was programmed with it. You are getting into the mathematically well specified, novel failure modes. Keep up the good work.
I think that your reasoning here is substantially confused. FDT can handle reasoning about many versions of yourself, some of which might be duplicated, just fine. If your utility function is such that U(α|ϕ⟩+β|ψ⟩)=α2U(|ϕ⟩)+β2U(|ψ⟩) where ⟨ϕ|ψ⟩=0. (and you don’t intrinsically value looking at quantum randomness generators) then you won’t make any decisions based on one.
If you would prefer the universe to be in 1√2(|A⟩+|B⟩) than a logical bet between A and B. (Ie you get A if the 3^^^3 th digit of π is even, else B) Then flipping a quantum coin makes sense.
I don’t think that randomized behavior is best described as a new decision theory, as opposed to an existing decision theory with odd preferences. I don’t think we actually should randomize.
I also think that quantum randomness has a Lot of power over reality. There is already a very wide spread of worlds. So your attempts to spread it wider won’t help.
This seems largely correct, so long as by “rationality”, you mean the social movement. The sort of stuff taught on this website, within the context of human society and psychology. Human rationality would not apply to aliens or arbitrary AI’s.
Some people use the word “rationality” to refer to the abstract logical structure of expected utility maximization, baysian updating, ect, as exemplified by AIXI, mathematical rationality does not have anything to do with humans in particular.
Your post is quite good at describing the usefulness of human rationality. Although I would say it was more useful in research. Without being good at spotting wrong Ideas, you can make a mistake on the first line, and produce a Lot of nonsense. (See most branches of philosophy, and all theology)
If you were truly alone in the multiverse, this algorithm would take a bet that had a 51% chance of winning them 1 paperclip, and a 49% chance of loosing 1000000 of them.
If independant versions of this bet are taking place in 3^^^3 parallel universes, it will refuse.
For any finite bet, for all sufficiently large N If the agent is using TDT and is faced with the choice of whether to make this bet in N multiverses, it will behave like an expected utility maximizer.
If saving nine people from drowning did give one enough credits to murder a tenth, society would look a lot more functional than it currently is. What sort of people would use this mechanism.
1)You are a competent good person,who would have gotten the points anyway. You push a fat man off a bridge to stop a runaway trolley. The law doesn’t see that as an excuse, but lets you off based on your previous good work.
2)You are selfish, you see some action that wouldn’t cause too much harm to others, and would enrich yourself greatly (Its harmful enough to be illegal). You also see opportunities to do lots of good. You do both instead of neither. Moral arbitrage.
The main downside I can see is people setting up situations to cause a harm, when the authorities aren’t looking, then gaining credit for stopping the harm.
My claim at the start had a typo in it. I am claiming that you can’t make a human seriously superhuman with a good education. Much like you can’t get a chimp up to human level with lots of education and “self improvement”. Serious genetic modification is another story, but at that point, your building an AI out of protien.
It does depend where you draw the line, but the for a wide range of performance levels, we went from no algorithm at that level, to a fast algorithm at that level. You couldn’t get much better results just by throwing more compute at it.
If you literally maximize expected number of paperclips, using standard decision theory, you will always pay the casino. To refuse the one shot game, you need to have a nonlinear utility function, or be doing something weird like median outcome maximization.
Choose action A to maximixe m such that P(paperclip count>m|a)=1/2
A well defined rule, that will behave like maximization in a sufficiently vast multiverse.
Humans are not currently capable of self improvement in the understanding your our own source code sense. The “self improvement” section in bookstores doesn’t change the hardware or the operating system, it basically adds more data.
Of course talent and compute both make a difference, in the sense that ∂o/∂i>0 and ∂o/∂r>0. I was talking about the subset of worlds where research talent was by far the most important. ∂o/∂r<<∂o/∂i.
In a world where researchers have little idea what they are doing, and are running a new AI every hour hoping to stumble across something that works, the result holds.
In a world where research involves months thinking about maths, then a day writing code, then an hour running it, this result holds.
In a world where everyone knows the right algorithm, but it takes a lot of compute, so AI research consists of building custom hardware and super-computing clusters, this result fails.
Currently, we are somewhere in the middle. I don’t know which of these options future research will look like, although if its the first one, friendly AI seems unlikely.
In most of the scenarios where the first smarter than human AI, is orders of magnitude faster than a human, I would expect a hard takeoff. As we went from having no algorithms that could say (tell a cat from a dog) straight to having algorithms superhumanly fast at doing so, there was no algorithm that worked, but took supercomputer hours, this seems like a plausible assumption.
When an intelligence builds another intelligence, in a single direct step, the output intelligence o is a function of the input intelligence i, and the resources used r. o(i,r). This function is clearly increasing in both i and r. Set r to be a reasonably large level of resources, eg 1030flops, 20 years to think about it. A low input intelligence, eg a dog, would be unable to make something smarter than itself. o(id,r)<id. A team of experts (by assumption that ASI is made), can make something smarter than themselves. o(ie,r)>ie. So there must be a fixed point. o(if,r)=f. The questions then become, how powerful is a pre fixed point AI. Clearly less good at AI research than a team of experts. As there is no reason to think that AI research is uniquely hard to AI, and there are some reasons to think it might be easier, or more prioritized, if it can’t beat our AI researchers, it can’t beat our other researchers. It is unlikely to make any major science or technology breakthroughs.
I recon that ∂o/∂i(if,r) is large (>10) because on an absolute scale, the difference between an IQ 90 and an IQ120 human is quite small, but I would expect any attempt at AI made by the latter to be much better. In a world where the limiting factor is researcher talent, not compute, the AI can get the compute it needs for r in hours (seconds? milliseconds??) As the lumpiness of innovation puts the first post fixed point AI a non-exponentially tiny distance ahead, (most innovations are at least 0.1% that state of the art better in a fast moving field) then a handful of cycles or recursive self improvement (<1 day) is enough to get the AI into the seriously overpowered range.
The question of economic doubling times would depend on how fast an economy can grow when tech breakthroughs are limited by human researchers. If we happen to have cracked self replication at about this point, it could be very fast.
Consider a theory to be a collection of formal mathematical statements about how idealized objects behave. For example, Conways game of life is a theory in the sense of a completely self contained set of rules.
If you have multiple theories that produce similar results, its helpful to have a bridging law. If your theories were Newtonian mechanics, and general relativity, a bridging law would say which numbers in relativity matched up with which numbers in Newtonian mechanics. This allows you to translate a relativistic problem into a Newtonian one, solve that, and translate the answer back into the relativistic framework. This produces some errors, but often makes the maths easier.
Quantum many worlds is a simple theory. It could be simulated on a hypercomputer with less than a page of code. There is also a theory where you take the code for quantum many worlds, and add “observers” and “wavefunction collapse” with extra functions within your code. This can be done, but it is many pages of arbitrary hacks. Call this theory B. If you think this is a strawman of many worlds, describe how you could get a hypercomputer outside the universe to simulate many worlds with a short computer program.
The bridging between Quantum many worlds and human classical intuitions is quite difficult and subtle. Faced with a simulation of quantum many worlds, it would take a lot of understanding of quantum physics to make everyday changes, like creating or moving macroscopic objects.
Theory B however is substantially easier to bridge to our classical intuitions. Theory B looks like a chunk of quantum many worlds, plus a chunk of classical intuition, plus a bridging rule between the two.
The any description of the Copenhagen interpretation of quantum mechanics seems to involve references to the classical results of a measurement, or a classical observer. Most versions would allow a superposition of an atom being in two different places, but not a superposition of two different presidents winning an election.
If you don’t believe atoms can be in superposition, you are ignoring lots of experiments, if you do believe that you can get a superposition of two different people being president, that you yourself could be in a superposition of doing two different things right now, then you believe many worlds by another name. Otherwise, you need to draw some sort of arbitrary cutoff. Its almost like you are bridging between a theory that allows superpositions, and an intuition that doesn’t.
“Now I’m not clear exactly how often quantum events lead to a slightly different world”
The answer is Very Very often. If you have a piece of glass and shine a photon at it, such that it has an equal chance of bouncing and going through, the two possibilities become separate worlds. Shine a million photons at it and you split into 21000000worlds, one for each combination of photons going through and bouncing. Note that in most of the worlds, the pattern of bounces looks random, so this is a good source of random numbers. Photons bouncing of glass are just an easy example, almost any physical process splits the universe very fast.
The nub of the argument is that every time we look in our sock drawer, we see all our socks to be black.
Many worlds says that our socks are always black.
The Copenhagen interpretation says that us observing the socks causes them to be black. The rest of the time the socks are pink with green spots.
Both theories make identical predictions. Many worlds is much simpler to fully specify with equations, and has elegant mathematical properties. The Copenhagen interpretation has special case rules that only kick in when observing something. According to this theory, there is a fundamental physical difference between a complex collection of atoms, and an “observer” and somewhere in the development of life, creatures flipped from one to the other.
The Copenhagen interpretation doesn’t make it clear if a cat is a very complex arrangement of molecules, that could in theory be understood as a quantum process that doesn’t involve the collapse of wave functions, or if cats are observers and so collapses wave functions.
Hello. I see that while the deadline has passed, the form is still open. Is it still worthwhile to apply?