Any algorithm that gets stuck in local optimum so easily will not be very intelligent or very useful. Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully. We don’t get stuck in local optima as much as current RL algorithms.
AIXI would be very good at making complex plans and doing well first time. You could tell it the rules of chess and it would play PERFECT chess first time. It does not need lots of examples to work from. Give it any data that you happen to have available, and it will become very competent, and able to carry out complex novel tasks first time.
Current reinforcement learning algorithms aren’t very good at breaking out of boxes because they follow the local incentive gradient. (I say not very good at, because a few algorithms have exploited glitches in a way thats a bit “break out the boxish”) In some simple domains, its possible to follow the incentive gradient all the way to the bottom. In other environments, human actions already form a good starting point, and following the incentive gradient from there can make the solution a bit better.
I agree that most of the really dangerous break out the boxes probably can’t be reached by local gradient decent from a non adversarial starting point. (I do not want to have to rely on this)
I agree that you can attach loads of sensors to say postmen, and train a big neural net to control a humanoid robot to deliver letters, given millions of training examples. You can probably automate many of the training weight fiddling tasks currently done by grad student descent to make big neural nets work.
I agree that this could be somewhat useful economically, as a significant proportion of economic productivity could be automated.
What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can’t protect us from making an unfriendly AGI.
I’m also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms. I suspect you would get to a local optimum of a reinforcement learning algorithm producing very slight variations of reinforcement learning. This might be quite powerful, but not anywhere near the limit of self improving AGI.
I disagree outright with
Any long term planning processes that consider weird plans for achieving goals (similar to “break out of the box”) will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task.
Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!
And the deeper reason for that is that we have no idea how to tell what’s a hole.
Suppose you want to set the service generator to make a robot that cleans cars. If you give a blow by blow formal description of what you mean by “cleans cars” then your “service generator” is just a compiler. If you do not give a complete specification of what you mean, where does the information that “chopping off a nearby head to wipe windows with is unacceptable” come from. If the service generator notices that cars need cleaning and build the service by itself, you have an AGI by another name.
Obviously, if you have large amounts of training data made by humans with joysticks, and the robot is sampling from the same distribution, then you should be fine. This system learns that dirtier windshields need more wiping from 100′s of examples of humans doing that, it doesn’t chop off any heads because the humans didn’t.
However, if you want the robot to display remotely novel behavior, then the distance between the training data and the new good solutions, becomes as large as the distance from the training data to bad solutions. If it’s smart enough to go to the shops and buy a sponge, without having that strategy hardcoded in when it was built, then its smart enough to break into your neighbors house and nick a sponge.
The only thing that distinguishes one from the other is what humans prefer.
Distinguishing low impact from high impact is also hard.
This might be a good approach, but I don’t feel it answers the question “I have a humanoid robot a hypercomputer and a couple of toddlers, how can I build something to look after the kids for a few weeks (without destroying the world) ?” So far, CAIS looks confused.
Whether an AI feels emotions depends on how loose you are with the category “emotion”. Take the emotion of curiosity. Investigating the environment is sometimes beneficial, due to the value of information. Because this behavior is quite complex, and the payoff is rare and indirect, reinforcement learners will struggle to learn it by default. However, there was a substantial evolutionary pressure towards minds that would display curious behavior. Evolution, being the blind idiot god, built a few heuristics for value of information that were effective in the environment of evolutionary adaptation, and hard wired these to the pleasure center.
In the modern environment, curiosity takes on a life of its own, and is no longer a good indication of value of information. Curiosity is a lost purpose.
Does AIXI display curiosity? It’s calculating the value of information exactly. It will do a science experiment if and only if the expected usefullness of the data generated is greater than the expected cost of the experiment.
This is a meaningless semantic question, AIXI displays behavior that has some similarities to curiosity, and many differences.
I expect a from first principles AI, MIRI style, to have about as much emotion as AIXI. A bodge it till you make it AI could have something a bit closer to emotions. The Neural net bashers have put huristics that correlate to value of information into their reinforcement learners. An evolutionary algorithm might produce something like emotions, but probably a different set of emotions than the ones humans feel. An uploaded mind would have our emotions, as would a sufficiently neuromorphic AI.
Suppose you take a terabyte of data on human decisions and actions. You search for the shortest program that outputs the data, then see what gets outputted afterwards. The shortest program that outputs the data might look like a simulation of the universe with an arrow pointing to a particular hard drive. The “imitator” will guess at what file is next on the disk.
One problem for imitation learning is the difficulty in pointing out the human and separating them from the environment. The details of the humans decision might depend on what they had for lunch. (Of course, multiple different decisions might be good enough. But this illustrates that “imitate a human” isn’t a clear cut procedure. And you have to be sure that the virtual lunch doesn’t contain virtual mind control nanobots. ;-)
You could put a load of data about humans into a search for short programs that produce the same data. Hopefully the model produced will be some approximation of the universe. Hopefully, you have some way of cutting a human out of the model and putting them into a virtual box.
Alternatively you could use nanotech for mind uploading, and get a virtual human in a box.
If we have lots of compute and not much time, then uploading a team of AI researchers to really solve friendly AI is a good idea.
If we have a good enough understanding of “imitation learning”, and no nanotech, we might be able to get an AI to guess the researchers mental states given observational data.
An imitation of a human might be a super-fast intelligence, with a lot of compute, but it won’t be qualitatively super-intelligent.
Building a non goal directed agent is like building a cart out of non-wood materials. Goal directed behavior is relatively well understood. We know that most goal directed designs don’t do what we want. Most arrangements of wood do not form a functioning cart.
I suspect that a randomly selected agent from the space of all non goal directed agents is also useless or dangerous, in much the same way that a random arrangement of non wood materials is.
Now there are a couple of regions of design space that are not goal directed and look like they contain useful AI’s. We might be better off making our cart from Iron, but Iron has its own problems.
0.5 is the almost fixed point. Its the point where f(x)−x goes from being positive to negative. If you take a sequence of continuous functions fn(x) that converge pointwise to f(x) then there will exist a sequence yn such that fn(yn)=yn and limn→∞yn=0.5.
If we ignore subagents and imagine a cartesian boundary, turned off can easily be defined as all future outputs are 0.
I also doubt that an AI working ASAP is safe in any meaningful sense. Of course you can move all the magic into “human judges world ok”. If you make lambda large enough, your AI is safe and useless.
If the utility function is 1 if widget exists, else 0. Where a widget is easily build-able, not currently existing object.
Suppose that ordering the parts through normal channels will take a few weeks. If it hacks the nukes and holds the world to ransom, then everyone at the widget factory will work nonstop, then drop dead of exhaustion.
Alternately it might be able to bootstrap self replicating nanotech in less time. The AI has no reason to care if the nanotech that makes the widget is highly toxic, and no reason to care if it has a shutoff switch or grey goos the earth after the widget is produced.
World looks ok at time T is not enough, you could still get something bad arising from the way seemingly innocuous parts were set up at time T. Being switched off and having no subagents in the conventional sense isn’t enough. What if the AI changed some physics data in such a way that humans would collapse the quantum vacuum state, believing the experiment they were doing was safe. Building a subagent is just a special case of having unwanted influence
You can have an AI that isn’t a consequentialist. Many deep learning algorithms are pure discriminators, they are not very dangerous or very useful. If I want to make a robot that tidies my room, the simplest conceptual framework for this is a consequentialist with real world goals. (I could also make a hackish patchwork of heuristics, like evolution would). If I want the robot to deal with circumstances that I haven’t considered, most hardcoded rules approaches fail, you need something that behaves like a consequentialist with real world preferences.
I’m not saying that all AI’s will be real world consequentialists, just that there are many tasks only real world consequentialists can do. So someone will build one.
Also, they set up the community after they realized the problem, and they could probably make more money elsewhere. So there doesn’t seem to be strong incentives to lie.
Adam’s law of slow moving disasters only applies when the median individual can understand the problem, and the evidence that it is a problem. We didn’t get nuclear protests or treaties until overwhelming evidence that nukes were possible in the form of detonations. No one was motivated to protest or sign treaties based on abstract physics arguments about what might be possible some day. Action regarding climate change didn’t start until the evidence became quite clear. The outer space treaty wasn’t signed until 1967, 5 years after human spaceflight and only 2 before the moon landings.
Human morals are specific and complex (in the formal, high information sense of the word complexity) They also seem hard to define. A strict definition of human morality, or a good referent to it would be morality. Could you have powerful and useful AI that didn’t have this. This would be some kind of whitelisting or low impact optimization, as a general optimization over all possible futures is a disaster without morality. These AI may be somewhat useful, but not nearly as useful as they would be with fewer constraints.
I would make a distinction between math first AI, like logical induction and AIXI, where we understand the AI before it is built. Compare to code first AI, like anything produced by an evolutionary algorithm, anything “emergent” and most deep neural networks, where we build the AI then see what it does. The former approach has a chance of working, a code first ASI is almost certain doom.
I would question the phrase “becomes apparent that alignment is a serious problem”, I do not think this is going to happen. Before ASI, we will have the same abstract and technical arguments we have now for why alignment might be a problem. We will have a few more alpha go moments, but while some go “wow, AGI near”, others will say “go isn’t that hard, we are a long way from this scifi AGI”, or “Superintelligence will be friendly by default”. A few more people might switch sides, but we have already had one alpha go moment, and that didn’t actually make a lot of difference. There is no giant neon sign flashing “ALIGNMENT NOW!“. See no fire alarm on AGI.
Even if we do have a couple of approaches that seem likely to work, it is still difficult to turn a rough approach into a formal technical specification into programming code. The code has to have reasonable runtime. Then the first team to develop AGI have to be using a math first approach and implement alignment without serious errors. I admit that there are probably a few disjunctive possibilities I’ve missed. And these events aren’t independent. Conditional on friendly ASI I would expect a large amount of talent and organizational competence working on AI safety.
My take on Roko’s basilisk is that you got ripped off in your acausal trade. Try to get a deal along the lines of, unless the AI goes extra specially far out of its way to please me, I’m gonna build a paperclipper just to spite it. At least trading a small and halfhearted attempt to help build AGI for a vast reward.
Imagine that you hadn’t figured out FDT, but you did have CDT and EDT. Would building an AI that defers to humans if they are different be an example of minimal but aligned?
If we take artificial addition too seriously, its hard to imagine what a “minimal arithmatician” looks like. If you understand arithmetic, you can make a perfect system, if you don’t, the system will be hopeless. I would not be surprised if there was some simple “algorithm of maximally efficient intelligence” and we built it. No foom, AI starts at the top. All the ideas about rates of intelligence growth are nonsense. We built a linear time AIXI.
If we have two distinct AI safety plans, the researchers are sensible to have a big discussion on which is better and only turn that one on. If not, and neither AI is fatally flawed, I would expect them to cooperate, they have very similar goals and neither wants war.
The maximise p=P(cauldrun full) constrained by p<1−ϵ has really weird failure modes. This is how I think it would go. Take over the world, using a method that has <ϵ chance of failing. Build giant computers to calculate the exact chance of your takeover. Build a random bucket filler to make the probability work out. Ie if ϵ=3%, then the AI does its best to take over the world, once it succeeds it calculates that its plan had a 2% chance of failure. So it builds a bucket filler that has 9798 chance of working. This policy leaves the chance of the cauldron being filled at exactly 97%.
One way to avoid the absurd conclusion is to say that it doesn’t matter if another mind is you.
Suppose I have a utility function over the entire quantum wave function. This utility function is mostly focused on beings that are similar to myself. So I consider the alternate me, that differs only in phone number, getting £100, about equal to the original me getting £100. As far as my utility function goes, both the versions of me would just be made worse off by forgetting the number.
I agree that not all rationalists would want wireheaded chickens, maybe they don’t care about chicken suffering at all. I also agree that you sometimes see bad logic and non-sequiters in the rationalist community. The non rationalist, motivated, emotion driven thinking, is the way that humans think by default. The rationalist community is trying to think a different way, sometimes successfully. Illustrating a junior rationalist having an off day and doing something stupid doesn’t illuminate the concept of rationality, the way that seeing a beginner juggler drop balls doesn’t show you what juggling is.
I’ve not seen a charity trying to do it, but wouldn’t be surprised if there was one. I’m trying to illustrate the different thought processes.
What is the way you were meditating? Relaxed or focused? Comfortable or cross legged? Focusing on a meaningless symbol, or letting your mind wander, or focusing on something meaningful?
I have found that I can produce a significant amount of emotion, of a chosen type, eg anxious, miserable, laughing, foucssedly happy, ect in seconds. I seem to do this just by focusing on the emotion with little conscious thinking of a concept that would cause the emotion. Is this meditation? (Introspection, highly unreliable)
Can you describe the meditation as attempting to modify your thought pattern in some way?
The whole idea of effective altruism is in getting the biggest bang for your charitable buck. If the evidence about how to do this was simple and incontrovertible, we wouldn’t need advanced rationality skills to do so. In the real world, choosing the best cause requires weighing up subtle balances of evidence on everything from if animals are suffering in ways we would care about, to how likely a super intelligent AI is.
On the other side, effective altruism is only persuasive if you have various skills and patterns of thought. These include the ability to think quantitatively, avoiding scope insensitivity, the ideas of expected utility maximization and the rejection of the absurdity heuristic. It is conceptually possible for a mind to be a brilliant rationalist with the sole goal of paperclip maximization, however all humans have the same basic emotional architecture, with emotions like empathy and caring. When this is combined with rigorous structured thought, the end result often looks at least somewhat utilitarianish.
Here are the kinds of thought patterns that a stereotypical rationalist, and a stereotypical non rationalist would engage in, when evaluating two charities. One charity is a donkey sanctuary, the other is trying to genetically modify chickens that don’t feel pain.
The leaflet has a beautiful picture of a cute fluffy donkey in a field of sunshine and flowers. Aww Don’t you just want to stroke him. Donkeys in medows seem an unambiguous pure good. Who could argue with donkeys. Thinking about donkeys makes me feel happy. Look, this one with the brown ears is called buttercup. I’ll put this nice poster up and send them some money.
Genetically modifying? Don’t like the sound of that. To not feel pain? Weird? Why would you want to do that? Imagines the chicken crushed into a tiny cage, looking miserable, “its not really suffering” doesn’t cut it. Wouldn’t that encourage people to abuse them? We should be letting them live in the wild as nature intended.
The main component of this decision comes from adding up the little “good” or “bad” labels that they attach to each word. There is also a sense in which a donkey sanctuary is a typical charity (the robin of birds), while GM chickens is an atypical charity (the ostrich).
The rationalist starts off with questions like “How much do I value a year of happy donkey life, vs a year of happy chicken life?“. How much money is needed to modify chickens, and get them used in farms. Whats the relative utility gain from a “non suffering” chicken in a tiny cage, vs a chicken in chicken paradise, relative to a factory farm chicken that is suffering? What is the size of the world chicken industry?
The rationalist ends up finding that the world chicken industry is huge, and so most sensible values for the other parameters lead to the GM chicken charity being better. They trust utilitarian logic more than any intuitions they might have.
I’ve figured out the difference, I was using the box topology https://en.wikipedia.org/wiki/Box_topology , while you were using the https://en.wikipedia.org/wiki/Product_topology.
You are correct. I knew about finite topological products and made a natural generalization, but it turns out not to be the standard meaning of Iw.