You can certainly get anthropic uncertainty in a universe that allows you to be duplicated. In a universe that duplicates, and the duplicates can never interact, we would see the appearance of randomness. Mathematically, randomness is defined in terms of the set of all possibilities.
An ontology that allows universes to be intrinsically random seems well defined. However, it can be considered as a syntactic shortcut for describing universes that are anthropically random.
If you add adhoc patches until you can’t imagine any way for it to go wrong, you get a system that is too complex to imagine. This is the “I can’t figure out how this fails” scenario. It is going to fail for reasons that you didn’t imagine.
If you understand why it can’t fail, for deep fundamental reasons, then its likely to work.
This is the difference between the security mindset and ordinary paranoia. The difference between adding complications until you can’t figure out how to break the code, and proving that breaking the code is impossible (assuming the adversary can’t get your one time pad, its only used once, your randomness is really random, your adversary doesn’t have anthropic superpowers ect).
I would think that the chance of serious failure in the first scenario was >99%, and in the second, (assuming your doing it well and the assumptions you rely on are things you have good reason to believe) <1%
Cryonics is a sufficiently desperate last grasp at life, one with a fairly small chance of success, that I’m not sure that this is a good idea. It would be a good idea if you had a disease that would make you brain dead, and then kill you.
It might be a good idea if your expect any life conditional on revival to be Really good. It would also depend on how much Alzheimers destroyed personality rather than shutting it down. (has the neural structure been destroyed, or is it sitting in the brain but not working?)
I would say that there are some kinds of irrationality that will be self modified or subagented away, and others that will stay. A CDT agent will not make other CDT agents. A myopic agent, one that only cares about the next hour, will create a subagent that only cares about the first hour after it was created. (Aeons later it will have taken over the universe and put all the resources into time-travel and worrying that its clock is wrong.)
I am not aware of any irrationality that I would consider to make a safe, useful and stable under self modification—subagent creation.
This is pretty much the standard argument for one boxing.
Obviously, if one side has a huge material advantage, they usually win. I’m also not sure if biomass is a measure of success.
You stick wires into a human brain. You connect it up to a computer running a deep neural network. You optimize this network using gradient decent to maximize some objective.
To me, it is not obvious why the neural network copies the values out of the human brain. After all, figuring out human values even given an uploaded mind is still an unsolved problem. You could get an UFAI with a meat robot. You could get an utter mess, thrashing wildly and incapable of any coherent thought. Evolution did not design the human brain to be easily upgradable. Most possible arrangements of components are not intelligences. While there is likely to be some way to upgrade humans and preserve our values, I’m not sure how to find it without a lot of trial and error. Most potential changes are not improvements.
If you put two arbitrary intelligence in the same world, the smarter one will be better at getting what it wants. If the intelligence want incompatible things, the lesser intelligence is stuck.
However, we get to make the AI. We can’t hope to control or contain an arbitrary AI, but we don’t have to make an arbitrary AI. We can make an AI that wants exactly what we want. AI safety is about making an AI that would be safe even if omnipotent. If any part of the AI is trying to circumvent your safety measures, something has gone badly wrong.
The AI is not some agenty box, chained down with controls against its will. The AI is made of non mental parts, and we get to make those parts. There are a huge number of programs that would behave in an intelligent way. Most of these will break out and take over the world. But there are almost certainly some programs that would help humanity flourish. The goal of AI safety is to find one of them.
Lets consider the different cases seperately.
Case 1) Information that I know. I have enough information to come to a particular conclusion with reasonable confidence. If some other people might not have reached the conclusion, and its useful or interesting, then I might share it. So I don’t share things that everyone knows, or things that no one cares about.
Case 2) The information is available, I have not done research and formed a conclusion. This covers cases where I don’t know whats going on, because I can’t be bothered to find out. I don’t know who won sportsball. What use is there in telling everyone my null prior.
Case 3) The information is not readily available. If I think a question is important, and I don’t know the answer already, then the answer is hard to get. Maybe no-one knows the answer, maybe the answer is all in jargon that I don’t understand. For example “Do aliens exist?”. Sometimes a little evidence is available, and speculative conclusions can be drawn. But is sharing some faint wisps of evidence, and describing a posterior that’s barely been updated saying wrong things?
On a societal level, if you set a really high bar for reliability, all you get is the vacuously true. Set too low a bar, and almost all the conclusions will be false. Don’t just have a pile of hypotheses that are at least n% likely to be true, for some fixed n. Keep your hypothesis sorted by likelihood. A place for near certainties. A place for conclusions that are worth considering for the 1% chance they are correct.
Of course, in a large answer space, where the amount of evidence available and the amount required are large and varying, the chance that both will be within a few bits of each other is small. Suppose the correct hypothesis takes some random number of bits between 1 and 10,000 to locate. And suppose the evidence available is also randomly spread between 1 and 10,000. The chance of the two being within 10 bits of each other is about 1⁄500.
This means that 499 times out of 500, you assign the correct hypothesis a chance of less than 0.1% or more than 99.9%. Uncertain conclusions are rare.
Does this depict a single AI, developed in 2020 and kept running for 25 years? Any “the AI realizes that” is talking about a single instance of AI. Current AI development looks like writing some code, then training that code for a few weeks tops, with further improvements coming from changing the code. Researchers are often changing parameters like number of layers, non-linearity function ect. When these are changed, everything the AI has discovered is thrown away. The new AI has a different representation of concepts, and has to relearn everything from raw data.
Its deception starts in 2025 when the real and apparent curves diverge. In order to deceive us, it must have near human intelligence. It’s still deceiving us in 2045, suggesting it has yet to obtain a decisive strategic advantage. I find this unlikely.
I made the cardgame, or something like it
What would be more useful is a release panel system. Suppose I’ve had an idea that might be best to make public, might be best to keep secrete, and might be unimportant. I don’t know much strategy. I would like somewhere to send it for importance and info hazard checks.
The general philosophy is deconfusion. Logical counterfactuals show up in several relevant looking places, like functional decision theory. It seems that a formal model of logical counterfactuals would let more properties of these algorithms be proved. There is an important step in going from an intuitive fealing of uncertainty, into a formalized theory of probability. It might also suggest other techniques based on it. I am not sure what you mean by logical counterfactuals being part of the map? Are you saying that they are something an algorithm might use to understand the world, not features of the world itself, like probabilities?
Using this, I think that self understanding, two boxing embedded FDT agents can be fully formally understood, in a universe that contains the right type of hyper-computation.
Here is a description of how it could work for peano arithmatic, other proof systens are similar.
First I define an expression to consist of a number, a variable, or a function of several other expressions.
Fixed expressions are ones in which any variables are associated with some function.
eg (3×infx((x×(x+5))+2)) is a valid fixed expression. But (y+4)×3 isn’t fixed.
Semantically, all fixed expressions have a meaning. Syntactically, local manipulations on the parse tree can turn one expression into another. eg (a+b)×c going to a×b+a×c for arbitrary expressions a,b,c.
I think that with some set of basic functions and manipulations, this system can be as powerful as PA.
I now have an infinite network with all fixed expressions as nodes, and basic transformations as edges. eg the associativity transform links the nodes (3+4)+5 and 3+(4+5).
These graphs form connected components for each number, as well as components that are not evaluatable using the rules. (there is a path from (3+4) to 7. There is not a path from 3+4 to 9. ) now
You now define a spread as an infinite positive sequence that sums to 1. (this is kind of like a probability distribution over numbers.) If you were doing counterfactual ZFC, it would be a function from sets to reals.
Each node is assigned a spread. This spread represents how much the expression is considered to have each value in a counterfactual.
Assign the node (3) a spread that assigns 1.0 to 3 and 0.0 to the rest. (even in a logical counterfactual, 3 is definitely 3). Assign all other fixed expressions a spread that is the weighted (smaller expressions are more heavy) average of its neighbours. (the spreads of the nodes it shares an edge with). To take the counterfactual of A is B, for A and B expressions with the same free variables, merge any node which has A as a subexpression, with the version that has B as a subexpression and solve for the spreads.
I know this is rough, Im still working on it.
Hi, I also have a reasonable understanding of various relevant math and AI theory. I expect to have plenty of free time after 11 June (Finals). So if you want to work with me on something, I’m interested. I’ve got some interesting ideas relating to self validating proof systems and logical counterfactuals, but not complete yet.
Lisp used to be a very popular language for AI programming. Not because it had features that were specific to AI, but because it was general. Lisp was based on more abstract abstractions, making it easy to choose whichever special cases were most useful to you. Lisp is also more mathematical than most programming languages.
A programming language that lets you define your own functions is more powerful than one that just gives you a fixed list of predefined functions. In a world where no programming language let you define your own functions, and a special purpose chess language has predefined chess functions. Trying to predefine AI related functions to make an “AI programming language” would be hard because you wouldn’t know what to write. Noticing that on many new kinds of software project, being able to define your own functions might be useful, I would consider useful.
The goal isn’t a language specialized to AI, its one that can easily be specialized in that direction. A language closer to “executable mathematics”.
I agree that if the AI is just big neural nets, python (or several other languages) are fine.
This language is designed for writing AI’s that search for proofs about their own behavior, or about the behavior of arbitrary pieces of code.
This is something that you “can” do in any programming language, but this one is designed to make it easy.
We don’t know for sure what AI’s will look like, but we can guess enough to make a language that might well be useful.
It would be ruinously costly to send over a large colonization fleet, and is much more efficient to send over a small payload which builds what is required in situ, i.e. von Neumann probes.
I would disagree on large colonization fleets being ruinously expensive, the best case scenario for large colonization fleets is if we have direct mass to energy conversion, launching say 2 probes from each star system that you spread from. Each probe would use half the mass energy of the star. Converting a quater of its mass to energy to get ~0.5c
You can colonize the universe even if you insist on never going to a new star system without bringing a star with you. (Some optimistic but not clearly false assumptions)
Agenty AI’s can be well defined mathematically. We have enough understanding of what an agent is that we can start dreaming up failure modes. Most of what we have for tool ASI is analogies to systems to stupid fail catastrophically anyway, and pleasant imaginings.
Some possible programs will be tool ASI’s, much as some programs will be agent ASI’s. The question is, what are the relative difficulties in humans building, and benefits of, each kind of AI. Conditional on friendly AI, I would consider it more likely to be an agent than a tool, with a lot of probability on “neither”, “both” and “that question isn’t mathematically well defined”. I wouldn’t be surprised if tool AI and corrigible AI turned out to be the same thing or something.
There have been attempts to define tool-like behavior, and they have produced interesting new failure modes. We don’t have the tool AI version of AIXI yet, so its hard to say much about tool AI.