I would like to propose a model that is more flattering to humans, and more similar to how other parts of human cognition work. When we see a simple textual mistake, like a repeated “the”, we don’t notice it by default. Human minds correct simple errors automatically without consciously noticing that they are doing it. We round to the nearest pattern.
I propose that automatic pattern matching to the closest thing that makes sense is happening at a higher level too. When humans skim semi contradictory text, they produce a more consistent world model that doesn’t quite match up with what is said.
Language feeds into a deeper, sensible world model module within the human brain and GPT2 doesn’t really have a coherent world model.
Within a narrow field, where data is plentiful, learning rationality is much less powerful than learning from piles of data. Imagine three people, A, B and C. A doesn’t know any chess or rationality, B has studied game theory, bays theorem, principles of decision theory and all round rationality. They have never played chess before, and have just been told the rules. C has been playing chess for years.
I would expect C to win easily. Its much easier to learn from experience, and remember your teachers experience, than it is to deduce what good chess strategies are from first principles. The only time I would expect B to win is if they were playing nim, or some other game with a simple winning strategy, and C had an intuition for this strategy, but sometimes made mistakes. I would expect B to beat A however.
Rationality is learning to squeeze every last drop of usefulness out of your data, and doing this is less effective at just grabbing more data when data is plentiful. Financial markets are another plentiful data domain. Many hedge fundies already know game theory, they also have a detailed knowledge of financial minutiae. Wanabe rationalists, If you want to be a banker, go ahead. But don’t expect to beat the market from rationality alone any more than you can good deduce chess moves from first principles and beat a grandmaster without ever having played before.
Rationality comes into its own because it applies a small boost to many domains of skill, not a big boost to any one. It also works much better in the absence of piles of data.
The every day world is roughly inexploitable, and very data rich. The regions you would expect rationality to do well in are the ones where there isn’t a pile of data so large even a scientist can’t ignore it. Fermi Paradox, AGI design, Interpretations of Quantum mechanics, Philosophical Zombies, ect.
There is also a cultural element in that the people who know most Rationality have more important things to do than using this to gain a slight advantage in buisness. Many of the people here would rather be discussing AI alignment, or the fermi paradox, or black holes, or anything interesting really than being an investment banker. All the people that get to be skilled rationalists value knowledge for its own sake and are pursuing that.
You would also need many data points to gain good evidence unless rationality was just magic. I am faced with a tricky choice and choose option 1. Its quite good. Would I have chosen option 2 if I hadn’t learned rationality? How good was option 2 any way? It’s hard to spot when rationaliy has helped someone.
In conclusion, the lack of “Rationality gave me magic powers” clickbait is not significant evidence that we are doing something wrong. A large Randomized controlled trial finding that rationality didn’t work would be worrying.
If you add adhoc patches until you can’t imagine any way for it to go wrong, you get a system that is too complex to imagine. This is the “I can’t figure out how this fails” scenario. It is going to fail for reasons that you didn’t imagine.
If you understand why it can’t fail, for deep fundamental reasons, then its likely to work.
This is the difference between the security mindset and ordinary paranoia. The difference between adding complications until you can’t figure out how to break the code, and proving that breaking the code is impossible (assuming the adversary can’t get your one time pad, its only used once, your randomness is really random, your adversary doesn’t have anthropic superpowers ect).
I would think that the chance of serious failure in the first scenario was >99%, and in the second, (assuming your doing it well and the assumptions you rely on are things you have good reason to believe) <1%
I suspect that the social institutions of Law and Money are likely to become increasingly irrelevant background to the development of ASI.
If you believe that there is a good chance of immortal utopia, and a large chance of paperclips in the next 5 years, the threat that the cops might throw you in jail, (on the off chance that they are still in power) is negligible.
The law is blind to safety.
The law is bureaucratic and ossified. It is probably not employing much top talent, as it’s hard to tell top talent from the rest if you aren’t as good yourself (and it doesn’t have the budget or glamor to attract them). Telling whether an organization is on line for not destroying the world is HARD. The safety protocols are being invented on the fly by each team, the system is very complex and technical and only half built. The teams that would destroy the world aren’t idiots, they are still producing long papers full of maths and talking about the importance of safety a lot. There are no examples to work with, or understood laws.
Likely as not (not really, too much conjugation here), you get some random inspector with a checklist full of thing that sound like a good idea to people who don’t understand the problem. All AI work has to have an emergency stop button that turns the power off. (The idea of an AI circumventing this was not considered by the person who wrote the list).
All the law can really do is tell what public image an AI group want’s to present, provide funding to everyone, and get in everyone’s way. Telling cops to “smash all GPU’s” would have an effect on AI progress. The fund vs smash axis is about the only lever they have. They can’t even tell an AI project from a maths convention from a normal programming project if the project leaders are incentivized to obfuscate.
After ASI, governments are likely only relevant if the ASI was programmed to care about them. Neither paperclippers or FAI will care about the law. The law might be relevant if we had tasky ASI that was not trivial to leverage into a decisive strategic advantage. (An AI that can put a strawberry on a plate without destroying the world, but that’s about the limit of its safe operation.)
Such an AI embodies an understanding of intelligence and could easily be accidentally modified to destroy the world. Such scenarios might involve ASI and timescales long enough for the law to act.
I don’t know how the law can handle something that, can easily destroy the world, has some economic value (if you want to flirt danger) and, with further research could grant supreme power. The discovery must be limited to a small group of people, (law of large number of nonexperts, one will do something stupid). I don’t think the law could notice what it was, after all the robot in-front of the inspector only puts strawberries on plates. They can’t tell how powerful it would be with an unbounded utility function.
1)Climate change caused extinction is not on the table. Low tech humans can survive everywhere from the jungle to the arctic. Some humans will survive.
2) I suspect that climate change won’t cause massive social collapse. It might well knock 10% of world GDP, but it won’t stop us having an advanced high tech society. At the moment, its not causing damage on that scale, and I suspect that in a few decades, we will have biotech, renewables or other techs that will make everything fine. I suspect that the damage caused by climate change won’t increase by more than 2 or 3 times in the next 50 years.
3) If you are skilled enough to be a scientist, inventing a solar panel that’s 0.5% more efficient does a lot more good than showing up to protests. Protest’s need many people to work, inventors can change the world by themselves. Policy advisors and academics can suggest action in small groups. Even working a normal job and sending your earnings to a well chosen charity is likely to be more effective.
4) Quite a few people are already working on global warming. It seems unlikely that a problem needs 10,000,001 people working on it to solve, and if only 10,000,000 people work on it, they won’t manage. Most of the really easy work on global warming is already being done. This is not the case with AI risk as of 10 years ago, for example. (It’s got a few more people working on it since then, still nothing like climate change.)
Your treating the low bandwith oracle as an FAI with a bad output cable. You can ask it if another AI is friendly if you trust it to give you the right answer. As there is no obvious way to reward the AI for correct friendliness judgements, you risk running an AI that isn’t friendly, but still meets the reward criteria.
The low bandwidth is to reduce manipulation. Don’t let it control you with a single bit.
Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a random card from the pile. If the subject is shown one side of the card, and its blue, they gain a bit of evidence that the card is blue on both sides. Give them the option to bet on the colour of the other side of the card, before and after they see the first side. Invert the prospect theory curve to get from implicit probability to betting behaviour. The people should perform a larger update in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.
Actually there are some subtle issues here that I didn’t spot before. If you take a small (not exponentially vast) region of space-time, and condition on that region containing at least 100 observer seconds, it is far more likely that this is from a single Boltzmann astronaut, than from 100 separate Boltzmann brains.
However if you select a region of space-time with hyper-volume
Then it is likely to contain a Boltzmann brain of mass 1kg, and we suppose that can think for 1 second. The chance of the same volume containing a 2kg Boltzmann brain is
So unless that extra 1 kg of life support can let the Boltzmann brain exist for exp(10^69) seconds, most observer moments should not have life support.
Imagine a lottery thats played by 1,000,000,000 people. there is 1 prize of £1,000,000 and 1,000 prizes of £100,000 each. If I say that my friends have won at least £1000,000 between them (and that I have a number of friends <<100,000) , then its likely that one friend who hit the jackpot. But if I pick a random £1 handed out by this lottery, and look at where it goes, it probably goes to a runner up.
This is directly analogous, except with smaller numbers, and £££’s instead of subjective experience. The one big win is the Boltzmann astronaut, the smaller prizes are Boltzmann brains.
The reason for this behaviour is that doubling the size of the spacetime considered makes a Boltzmann astronaut twice as likely, but makes a swarm of 100 Boltzmann brains 2^100 times as likely. For any small region of spacetime, Nothing happens is the most likely option. A Boltzmann brain, is Far less likely, and a Boltzmann astronaut Far less likely than that. The ratio of thinking times is small enough to be ignored.
If we think that we are Boltzmann brains, then we should expect to freeze over in the next instant. If we thought that we were Boltzmann brains, and that there was at least a billion observer moments nearby, then we should expect to be a Boltzmann astronaut.
Whether an AI feels emotions depends on how loose you are with the category “emotion”. Take the emotion of curiosity. Investigating the environment is sometimes beneficial, due to the value of information. Because this behavior is quite complex, and the payoff is rare and indirect, reinforcement learners will struggle to learn it by default. However, there was a substantial evolutionary pressure towards minds that would display curious behavior. Evolution, being the blind idiot god, built a few heuristics for value of information that were effective in the environment of evolutionary adaptation, and hard wired these to the pleasure center.
In the modern environment, curiosity takes on a life of its own, and is no longer a good indication of value of information. Curiosity is a lost purpose.
Does AIXI display curiosity? It’s calculating the value of information exactly. It will do a science experiment if and only if the expected usefullness of the data generated is greater than the expected cost of the experiment.
This is a meaningless semantic question, AIXI displays behavior that has some similarities to curiosity, and many differences.
I expect a from first principles AI, MIRI style, to have about as much emotion as AIXI. A bodge it till you make it AI could have something a bit closer to emotions. The Neural net bashers have put huristics that correlate to value of information into their reinforcement learners. An evolutionary algorithm might produce something like emotions, but probably a different set of emotions than the ones humans feel. An uploaded mind would have our emotions, as would a sufficiently neuromorphic AI.
Butterfly effects essentially unpredictable, given your partial knowledge of the world. Sure, you doing homework could cause a tornado in Texas, but it’s equally likely to prevent that. To actually predict which, you would have to calculate the movement of every gust of air around the world. Otherwise your shuffling an already well shuffled pack of cards. Bear in mind that you have no reason to distinguish the particular action of “doing homework” from a vast set of other actions. If you really did know what actions would stop the Texas tornado, they might well look like random thrashing.
What you can calculate is the reliable effects of doing your homework. So, given bounded rationality, you are probably best to base your decisions on those. The fact that this only involves homework might suggest that you have an internal conflict between a part of yourself that thinks about careers, and a short term procrastinator.
Most people who aren’t particularly ethical still do more good than harm. (If everyone looks out for themselves, everyone has someone to look out for them. The law stops most of the bad mutual defections in prisoners dilemmas) Evil genius trying to trick you into doing harm are much rarer than moderately competent nice people trying to get your help to do good.
Answer to question 1.
Let xi+1=f(xi) for arbitrary x0. Call c=d(x0,x1). Then by induction (i<j)d(xi,xj)≤∑k=j−1k=id(xk,xk+1)≤∑k=j−1k=icqk≤cqi1−q (power series simplification)
Therefore ∀δ>0:∃n∈N∀i>n,j>i:d(i,j)<cqn1−q<δ ie xi is a cauchy sequence. However (X,d) is said to be complete, which by definition means any cauchy sequence is convergent. So xn→y and d(xi,y)≤sup∞j=id(xi,xj)≤cqi1−q So xn converges exponentially quickly
Answer to question 2.
From part 1, as f is continuous, y=limn→∞(f(xn+1))=limn→∞(f(xn))=f(limn→∞(xn))=f(y) So y is a fixed point. Suppose x and y are both fixed points of f(x) a contraction map. Then f(x)=x and f(y)=y so d(f(x),f(y))≤qd(x,y)=qd(f(x),f(y)) therefore d(x,y)=0 so x=y. Thus f has a unique fixed point.
Answer to question 3.
(R,d(x,y)=|x−y|) is a metric space. Its the real line with normal distance. Let f(x)=√1+x2 . Then f is a contraction map because f is differentiable and f′(x)=x√1+x2has the property ∀x:|f′(x)|<1. However no fixed point exists as ∀x:f(x)>x. This works because the sequence xi generated from repeated applications of f will tend to infinity, despite successive terms becoming ever closer.
How is this stating anything more than “the whole is safe if all the parts are safe”? Like saying a mathematical proof is valid if all the steps are valid, this is almost useless if you don’t know which individual steps are valid or safe.
There are several problems with this argument, firstly the AI has code describing its goal. It would seem much easier to copy this code across than to turn our moral values into code. Secondly the AI doesn’t have to be confident in getting it right. A paperclipping AI has two options, it can work at a factory and make a few paperclips, or it can self improve, but it has a chance of the resultant AI won’t maximize paperclips. However the amount of paperclips it could produce if it successfully self improves is astronomically vast. If its goal function is linear in paperclips, it will self improve if it thinks it has any chance of getting it right. If it fails at preserving its values as its self improving then the result looks like a staple maximizer.
Humans (at least the sort thinking about AI) know that we all have roughly similar values, so if you think you might have solved alignment, but aren’t sure, it makes sense to ask for others to help you, to wait for someone else to finish solving it.
However a paperclipping AI would know that no other AI’s had its goal function. If it doesn’t build a paperclipping super-intelligence, no one else is going to. It will therefore try to do so even if unlikely to succeed.
I disagree outright with
Any long term planning processes that consider weird plans for achieving goals (similar to “break out of the box”) will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task.
Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!
And the deeper reason for that is that we have no idea how to tell what’s a hole.
Suppose you want to set the service generator to make a robot that cleans cars. If you give a blow by blow formal description of what you mean by “cleans cars” then your “service generator” is just a compiler. If you do not give a complete specification of what you mean, where does the information that “chopping off a nearby head to wipe windows with is unacceptable” come from. If the service generator notices that cars need cleaning and build the service by itself, you have an AGI by another name.
Obviously, if you have large amounts of training data made by humans with joysticks, and the robot is sampling from the same distribution, then you should be fine. This system learns that dirtier windshields need more wiping from 100′s of examples of humans doing that, it doesn’t chop off any heads because the humans didn’t.
However, if you want the robot to display remotely novel behavior, then the distance between the training data and the new good solutions, becomes as large as the distance from the training data to bad solutions. If it’s smart enough to go to the shops and buy a sponge, without having that strategy hardcoded in when it was built, then its smart enough to break into your neighbors house and nick a sponge.
The only thing that distinguishes one from the other is what humans prefer.
Distinguishing low impact from high impact is also hard.
This might be a good approach, but I don’t feel it answers the question “I have a humanoid robot a hypercomputer and a couple of toddlers, how can I build something to look after the kids for a few weeks (without destroying the world) ?” So far, CAIS looks confused.
Human morals are specific and complex (in the formal, high information sense of the word complexity) They also seem hard to define. A strict definition of human morality, or a good referent to it would be morality. Could you have powerful and useful AI that didn’t have this. This would be some kind of whitelisting or low impact optimization, as a general optimization over all possible futures is a disaster without morality. These AI may be somewhat useful, but not nearly as useful as they would be with fewer constraints.
I would make a distinction between math first AI, like logical induction and AIXI, where we understand the AI before it is built. Compare to code first AI, like anything produced by an evolutionary algorithm, anything “emergent” and most deep neural networks, where we build the AI then see what it does. The former approach has a chance of working, a code first ASI is almost certain doom.
I would question the phrase “becomes apparent that alignment is a serious problem”, I do not think this is going to happen. Before ASI, we will have the same abstract and technical arguments we have now for why alignment might be a problem. We will have a few more alpha go moments, but while some go “wow, AGI near”, others will say “go isn’t that hard, we are a long way from this scifi AGI”, or “Superintelligence will be friendly by default”. A few more people might switch sides, but we have already had one alpha go moment, and that didn’t actually make a lot of difference. There is no giant neon sign flashing “ALIGNMENT NOW!”. See no fire alarm on AGI.
Even if we do have a couple of approaches that seem likely to work, it is still difficult to turn a rough approach into a formal technical specification into programming code. The code has to have reasonable runtime. Then the first team to develop AGI have to be using a math first approach and implement alignment without serious errors. I admit that there are probably a few disjunctive possibilities I’ve missed. And these events aren’t independent. Conditional on friendly ASI I would expect a large amount of talent and organizational competence working on AI safety.
“safe” Is a value loaded word. To make a “safe” car, all you need is the rough approximation to human values that says humans value not being injured. To make a “safe” super intelligence, you need a more detailed idea of human values. This is where the philosophy comes in, to specify exactly what we want.