If I am sending you an email, it could be because I have some info that I believe would benefit you and am honestly trying to be helpful in sending it. I am unlikely to do this if I have to pay you.
Having these norms would create scammers that try to look prestigious. If you only get paid when you reply to a message, lots of low value replies are going to be sent.
Direct neural IO has a large fitness moat. Once an animal has any kind of actuator that can modify the environment, and any kind of sensor that can detect info about the environment, then one animals actions will sometimes modify what another animal senses, and hence how it behaves. Evolution can then get to work optimizing this. Many benefits can accrue, even if no other animal communicates. A crow pattering its feet to bring up worms has some understanding of other animals being things it can manipulate, and the tools to do it. (humans are best at training other animals as well as communicating, both need a theory of mind.)
Animals don’t touch neurons together except in freak accidents, where any chance of survival is minimal. Until you have functional communication, banging neurons together is useless. Until you have a system that filters it out, saline exposure will spam nonsense. And once you have one form of communication, the pressure to develop a second is almost none.
You are handed a hypercomputer, and allowed to run any code you like on it. You can then take 1Tb of data from your computations and attach it to a normal computer. The hypercomputer is removed. You are then handed a magic human utility function. How do you make an FAI with these resources?
The normal computer is capable of running a highly efficient super-intelligence. The hypercomputer can do a brute force search for efficient algorithms. The idea is to split FAI into building a capability module, and a value module.
The problem with tests is that the AI behaving well when weak enough to be tested doesn’t guarantee it will continue to do so.
If you are testing a system, that means that you are not confidant that it is safe. If it isn’t safe, then your only hope is for humans to stop it. Testing an AI is very dangerous unless you are confidant that it can’t harm you.
A paperclip maximizer would try to pass your tests until it was powerful enough to trick its way out and take over. Black box testing of arbitrary AI’s gets you very little safety.
Also some peoples intuitions think that a smile maximizing AI is a good idea. If you have a straightforward argument that appeals to the intuitions of the average Joe Blogs, and can’t be easily formalized, then I would take the difficulty formalizing it as evidence that the argument is not sound.
If you take a neural network and train it to recognize smiling faces, then attach that to AIXI, you get a machine that will appear to work in the lab, when the best it can do is make the scientists smile into its camera. There will be an intuitive argument about how it wants to make people smile, and people smile when they are happy. The AI will tile the universe with cameras pointed at smiley faces as soon as it escapes the lab.
I should have been clearer, the point isn’t that you get correct values, the point is that you get out of the swath of null or meaningless values and into the just wrong. While the values gained will be wrong, they would be significantly correlated, its the sort of AI to produce drugged out brains in vats, or something else that’s not what we want, but closer than paperclips. One measure you could use of human effectiveness is given all possible actions ordered by util, what percentile are the actions we took in.
Once we get into this region, it becomes clear that the next task is to fine tune our model of the bounds on human rationality, or figure out how to get an AI to do it for us.
There are no free lunch theorems “proving” that intelligence is impossible. There is no algorithm that can optimize an arbitrary environment. We display intelligence. The problem with the theorem comes from the part where you assume an arbitrary max-entropy environment, rather than inductive priors. If you assume that human values are simple (low komelgorov complexity) and that human behavior is quite good at fulfilling those values, then you can deduce non trivial values for humans.
As far as I am concerned, hedonism is an approximate description of some of my preferences. Hedonism is a utility function close to, but not equal to mine. I see no reason why a FAI should contain a special term for hedonism. Just maximize preferences, anything else is strictly worse, but not necessarily that bad.
I do agree that there are many futures we would consider valuable. Our utility function is not a single sharp spike.
Suppose you offer to pay a penny to swap mushroom for pepperoni, and then another penny to swap back. This agent will refuse, failing to money pump you.
Suppose you offer the agent a choice between pepperoni or mushroom, when it currently has neither. Which does it choose? If it chooses pepperoni, but refuses to swap mushroom for pepperoni then its decisions depend on how the situation is framed. How close does it have to get to the mushroom before they “have” mushroom and refuse to swap? Partial preferences only make sense when you don’t have to choose between unordered options.
We could consider the agent to have a utility function with a term for time consistency, they want the pizza in front of them at times 0 and 1 to be the same.
The AI asks for lots of info on biochemistry, and gives you a long list of chemicals that it claims cure various diseases. Most of these are normal cures. One of these chemicals will mutate the common cold into a lethal super plague. Soon we start some clinical trials of the various drugs, until someone with a cold takes the wrong one and suddenly the wold has a super plague.
The medial marvel AI is asked about the plague, It gives a plausible cover story for the plagues origins, along with describing an easy to make and effective vaccine. As casualties mount, humans rush to put the vaccine into production. The vaccine is designed to have an interesting side effect, a subtle modification of how the brain handles trust and risk. Soon the AI project leaders have been vaccinated. The AI says that it can cure the plague, it has a several billion base pair DNA file, that should be put into a bacterium. We allow it to output this file. We inspect it in less detail than we should have, given the effect of the vaccine, then we synthesize the sequence and put it in a bacteria. A few minutes later, the sequence bootstraps molecular nanotech. over the next day, the nanotech spreads around the world. Soon its exponentially expanding across the universe turning all matter into drugged out brains in vats. This is the most ethical action according to the AI’s total utilitarian ethics.
The fundamental problem is that any time that you make a decision based on the outputs of an AI, that gives it a chance to manipulate you. If what you want isn’t exactly what it wants, then it has incentive to manipulate.
(There is also the possibility of a side channel. For example, manipulating its own circuits to produce a cell phone signal, spinning its hard drive in a way that makes a particular sound, ect. Making a computer just output text, rather than outputing text, and traces of sound, microwaves and heat which can normally be ignored but might be maliciously manipulated by software, is hard)
Whether patterns of graphite on paper, or patterns of electricity in silicon, words are real physical things.
From an outside view, you have given a long list of wordy philosophical arguments, all of which involve terms that you haven’t defined. The success rate for arguments like that isn’t great.
We can be reasonably certain that the world is made up of some kind of fundamental part obeying simple mathematical laws. I don’t know which laws, but I expect there to be some set of equations, of which quantum mechanics and relativity are approximations, that predicts every detail of reality.
The minds of humans, including myself, are part of reality. Look at a philosopher talking about consciousness or qualia in great detail. “A Philosopher talking about qualia” is a high level approximate description of a particular collection of quantum fields or super-strings (or whatever reality is made of).
You can choose a set of similar patterns of quantum fields and call them qualia. This makes a qualia the same type of thing as a word or an apple. You have some criteria about what patterns of quantum fields do or don’t count as an X. This lets you use the word X to describe the world. There are various details about how we actually discriminate based on sensory experience. All of our idea of what an apple is comes from our sensory experience of apples, correlated to sensory experience of people saying the word “apple”. This is a feature of the map, not the territory.
I am a mind. A mind is a particular arrangement of quantum fields that selects actions based on some utility function stored within it. Deep blue would be a simpler example of a mind. The point is that minds are mechanistic, (mind is an implicitly defined set of patterns of quantum fields, like apple) minds also contain goals embedded within their structure. My goals happen to make various references to other minds, in particular they say to avoid an implicitly defined set of states that my map calls minds in pain.
I would use a definition of qualia in which they were some real, neurological phenomena. I don’t know enough neurology to say which.
The first question is whether you have enough information to locate human behavior. The concept of optimization is fairly straightforward, and it could get a rough estimate of our intelligence by seeing humans trying to solve some puzzle. In other words, the amount of data needed to get an optimizer is small. The amount of data needed to totally describe every detail of human values is large. This means that a random hypothesis based on a small amount of data will be an optimizer with non-human goals.
For example, maybe the human trainers value having real authentic experiences, but never had a cause to express that preference during training. The imitation fills the universe with people in VR pods not knowing that their life is fake. The imitations do however have a preference for ( random alien preference) because the trainers never showed that they didn’t prefer that.
Lets suppose you gave it vast amounts of data, and have a hypothesis space of all possible turing machines. (weighted by size). One fairly simple turing machine that would predict the data is a quantum simulation of a world similar to our own.
(Less than a kilobyte on the laws of QM, and the rest of the data goes towards pointing at a branch of the quantum multiverse with humans similar to us in. The simulation would also need something pointing at the input cable of the simulated AI. This gives us a virtual copy of the universe, as a program that predicts the flow of electricity in a particular cable. This code will be optimized to be short, not to be human comprehensible. I would not expect to be able to easily extract a human mind from the model.
If you put an upper bound on run time, and it is easily large enough to accurately simulate a human mind, then I would expect a program that was attempting to reason abstractly about the surrounding world. In a large pile of data, there will be many seemingly unrelated surface facts that actually have deep connections. A superhuman mind that abstractly reasons about the outside world, could use evolutionary psychology to predict human behavior. Using the laws of physics and a rough idea of humanities tech level to predict info about our tech. Intelligent abstract reasoning about our surroundings is likely to win out over simple heuristics be having more predictive power per bit. If you give it enough compute to predict humans, it also has enough compute for this.
All the problems of mesa optimization can’t be ruled out. Alternately it could be abstractly reasoning about its input wire, and give us a fast approximation of the virtual universe program above.
Finally, the virtual humans might realize that they are virtual and panic about it.
For deciding your own decisions, only a full description of your own utility function and decision theory will tell you what to do in every situation. And (work out what you would do if you were maximally smart, then do that) is a useless rule in practice. When deciding your own actions, you don’t need to use rules at all.
If you are in any kind of organization that has rules, you have to use your own decision theory to work out which decision is best. To do this would involve weighing up the pros and cons of rule breaking, with one of the cons being any punishment the rule enforcers might apply.
Suppose you are in charge, you get to write the rules and no one else can do anything about rules they don’t like.
You are still optimizing for more than just being correct. You want rules that are reasonably enforceable, the decision of whether or not to punish can only depend on things the enforcers know. You also want the rules to be short enough and simple enough for the rule followers to comprehend.
The best your rules can hope to do when faced with a sufficiently weird situation is not apply any restrictions at all.
Your right. I did some python. My version took 1.26, yours
0.78 microseconds. My code is just another point on the Pareto boundary.
A more sensible way to code this would be
def apply_polynomial( deriv, x ): sum = deriv div=1 xpow=1 for i from 1 to length( deriv ): div*=i xpow*=x sum += deriv[i] * xpow / div return sum
Its about as fast as the second, nearly as readable as the first, and works on any poly. (except the zero poly symbolized by the empty list. )
The bit about the tradeoffs is correct as far as I can tell.
Although if a single solution was by far the best under every metric, there wouldn’t be any tradeoffs.
In most real cases, the solution space is large, and there are many metrics. This means that its unusual for one solution to be the best by all of them. And in those situations, you might not see a choice at all.
I know MWI doesn’t imply equal measure, I was taking equal measure as an aditional hypothesis within the MWI framework.
We don’t know that because we don’t know anything about qualia.
Consider a sufficiently detailed simulation of a human mind, say full Quantum, except whenever there are multiple blobs of amplitude sufficiently detached from each other, one is picked pseudorandomly and the rest are deleted. Because it is a sufficiently detailed simulation of a human mind, it will say the same things a human would, for much the same reasons. Applying the generalized anti zombie principle says that it would have the feeling of making a choice.
There is not always a single optimal solution to a problem even for a perfect rationalist, and humans aren’t perfect rationalists.
My point is that when we show optimization pressure, that isn’t just a fluke, then there is no branch in which we do something totally stupid. There might be branches where we make a different reasonable decision.
I expect quantum ethics to have a utility function that is some measure of what computations are being done, and the quantum amplitude that they are done with.
If every time you made a choice, the universe split into a version where you did each thing, then there is no sense in which you chose a particular thing from the outside. From this perspective, we should expect human actions in a typical “universe” to look totally random. (There are many more ways to thrash randomly than to behave normally) This would make human minds basically quantum random number generators. I see substantial evidence that human actions are not totally random. The hypothesis that when a human makes a choice, the universe splits and every possible choice is made with equal measure is coherent, falsifiable and clearly wrong.
A simulation of a human mind running on reliable digital hardware would always make a single choice, not splitting the universe at all. They would still have the feeling of making a choice.
To the extent that you are optimizing, not outputting random noise, you aren’t creating multiple universes. It all adds up to normality.
While you are working on a theory of quantum ethics, it is better to use your classical ethics than a half baked attempt at quantum ethics. This is much the same as with predictions.
Fully complete quantum theory is more accurate than any classical theory, although you might want to use the classical theory for computational reasons. However, if you miss a minus sign or a particle, you can get nonsensical results, like everything traveling at light speed.
A complete quantum ethics will be better than any classical ethics (almost identical in everyday circumstances) , but one little mistake and you get nonsense.
Your treating the low bandwith oracle as an FAI with a bad output cable. You can ask it if another AI is friendly if you trust it to give you the right answer. As there is no obvious way to reward the AI for correct friendliness judgements, you risk running an AI that isn’t friendly, but still meets the reward criteria.
The low bandwidth is to reduce manipulation. Don’t let it control you with a single bit.
You can certainly get anthropic uncertainty in a universe that allows you to be duplicated. In a universe that duplicates, and the duplicates can never interact, we would see the appearance of randomness. Mathematically, randomness is defined in terms of the set of all possibilities.
An ontology that allows universes to be intrinsically random seems well defined. However, it can be considered as a syntactic shortcut for describing universes that are anthropically random.