My claim at the start had a typo in it. I am claiming that you can’t make a human seriously superhuman with a good education. Much like you can’t get a chimp up to human level with lots of education and “self improvement”. Serious genetic modification is another story, but at that point, your building an AI out of protien.
It does depend where you draw the line, but the for a wide range of performance levels, we went from no algorithm at that level, to a fast algorithm at that level. You couldn’t get much better results just by throwing more compute at it.
If you literally maximize expected number of paperclips, using standard decision theory, you will always pay the casino. To refuse the one shot game, you need to have a nonlinear utility function, or be doing something weird like median outcome maximization.
Choose action A to maximixe m such that P(paperclip count>m|a)=1/2
A well defined rule, that will behave like maximization in a sufficiently vast multiverse.
Humans are not currently capable of self improvement in the understanding your our own source code sense. The “self improvement” section in bookstores doesn’t change the hardware or the operating system, it basically adds more data.
Of course talent and compute both make a difference, in the sense that ∂o/∂i>0 and ∂o/∂r>0. I was talking about the subset of worlds where research talent was by far the most important. ∂o/∂r<<∂o/∂i.
In a world where researchers have little idea what they are doing, and are running a new AI every hour hoping to stumble across something that works, the result holds.
In a world where research involves months thinking about maths, then a day writing code, then an hour running it, this result holds.
In a world where everyone knows the right algorithm, but it takes a lot of compute, so AI research consists of building custom hardware and super-computing clusters, this result fails.
Currently, we are somewhere in the middle. I don’t know which of these options future research will look like, although if its the first one, friendly AI seems unlikely.
In most of the scenarios where the first smarter than human AI, is orders of magnitude faster than a human, I would expect a hard takeoff. As we went from having no algorithms that could say (tell a cat from a dog) straight to having algorithms superhumanly fast at doing so, there was no algorithm that worked, but took supercomputer hours, this seems like a plausible assumption.
When an intelligence builds another intelligence, in a single direct step, the output intelligence o is a function of the input intelligence i, and the resources used r. o(i,r). This function is clearly increasing in both i and r. Set r to be a reasonably large level of resources, eg 1030flops, 20 years to think about it. A low input intelligence, eg a dog, would be unable to make something smarter than itself. o(id,r)<id. A team of experts (by assumption that ASI is made), can make something smarter than themselves. o(ie,r)>ie. So there must be a fixed point. o(if,r)=f. The questions then become, how powerful is a pre fixed point AI. Clearly less good at AI research than a team of experts. As there is no reason to think that AI research is uniquely hard to AI, and there are some reasons to think it might be easier, or more prioritized, if it can’t beat our AI researchers, it can’t beat our other researchers. It is unlikely to make any major science or technology breakthroughs.
I recon that ∂o/∂i(if,r) is large (>10) because on an absolute scale, the difference between an IQ 90 and an IQ120 human is quite small, but I would expect any attempt at AI made by the latter to be much better. In a world where the limiting factor is researcher talent, not compute, the AI can get the compute it needs for r in hours (seconds? milliseconds??) As the lumpiness of innovation puts the first post fixed point AI a non-exponentially tiny distance ahead, (most innovations are at least 0.1% that state of the art better in a fast moving field) then a handful of cycles or recursive self improvement (<1 day) is enough to get the AI into the seriously overpowered range.
The question of economic doubling times would depend on how fast an economy can grow when tech breakthroughs are limited by human researchers. If we happen to have cracked self replication at about this point, it could be very fast.
Consider a theory to be a collection of formal mathematical statements about how idealized objects behave. For example, Conways game of life is a theory in the sense of a completely self contained set of rules.
If you have multiple theories that produce similar results, its helpful to have a bridging law. If your theories were Newtonian mechanics, and general relativity, a bridging law would say which numbers in relativity matched up with which numbers in Newtonian mechanics. This allows you to translate a relativistic problem into a Newtonian one, solve that, and translate the answer back into the relativistic framework. This produces some errors, but often makes the maths easier.
Quantum many worlds is a simple theory. It could be simulated on a hypercomputer with less than a page of code. There is also a theory where you take the code for quantum many worlds, and add “observers” and “wavefunction collapse” with extra functions within your code. This can be done, but it is many pages of arbitrary hacks. Call this theory B. If you think this is a strawman of many worlds, describe how you could get a hypercomputer outside the universe to simulate many worlds with a short computer program.
The bridging between Quantum many worlds and human classical intuitions is quite difficult and subtle. Faced with a simulation of quantum many worlds, it would take a lot of understanding of quantum physics to make everyday changes, like creating or moving macroscopic objects.
Theory B however is substantially easier to bridge to our classical intuitions. Theory B looks like a chunk of quantum many worlds, plus a chunk of classical intuition, plus a bridging rule between the two.
The any description of the Copenhagen interpretation of quantum mechanics seems to involve references to the classical results of a measurement, or a classical observer. Most versions would allow a superposition of an atom being in two different places, but not a superposition of two different presidents winning an election.
If you don’t believe atoms can be in superposition, you are ignoring lots of experiments, if you do believe that you can get a superposition of two different people being president, that you yourself could be in a superposition of doing two different things right now, then you believe many worlds by another name. Otherwise, you need to draw some sort of arbitrary cutoff. Its almost like you are bridging between a theory that allows superpositions, and an intuition that doesn’t.
“Now I’m not clear exactly how often quantum events lead to a slightly different world”
The answer is Very Very often. If you have a piece of glass and shine a photon at it, such that it has an equal chance of bouncing and going through, the two possibilities become separate worlds. Shine a million photons at it and you split into 21000000worlds, one for each combination of photons going through and bouncing. Note that in most of the worlds, the pattern of bounces looks random, so this is a good source of random numbers. Photons bouncing of glass are just an easy example, almost any physical process splits the universe very fast.
The nub of the argument is that every time we look in our sock drawer, we see all our socks to be black.
Many worlds says that our socks are always black.
The Copenhagen interpretation says that us observing the socks causes them to be black. The rest of the time the socks are pink with green spots.
Both theories make identical predictions. Many worlds is much simpler to fully specify with equations, and has elegant mathematical properties. The Copenhagen interpretation has special case rules that only kick in when observing something. According to this theory, there is a fundamental physical difference between a complex collection of atoms, and an “observer” and somewhere in the development of life, creatures flipped from one to the other.
The Copenhagen interpretation doesn’t make it clear if a cat is a very complex arrangement of molecules, that could in theory be understood as a quantum process that doesn’t involve the collapse of wave functions, or if cats are observers and so collapses wave functions.
Hello. I see that while the deadline has passed, the form is still open. Is it still worthwhile to apply?
This supposedly “natural” reference class is full of weird edge cases, in the sense that I can’t write an algorithm that finds “everybody who asks the question X”. Firstly “everybody” is not well defined in a world that contains everything from trained monkeys to artificial intelligence’s. And “who asks the question X” is under-defined as there is no hard boundary between a different way of phrasing the same question and slightly different questions. Does someone considering the argument in chinese fall into your reference class? Even more edge cases appear with mind uploading, different mental architectures, ect.
If you get a different prediction from taking the reference class of “people” (for some formal definition of “people”) and then updating on the fact that you are wearing blue socks, than you get from the reference class “people wearing blue socks”, then something has gone wrong in your reasoning.
The doomsday argument works by failing to update on anything but a few carefully chosen facts.
I would say that the concept of probability works fine in anthropic scenarios, or at least there is a well defined number that is equal to probability in non anthropic situations. This number is assigned to “worlds as a whole”. Sleeping beauty assigns 1⁄2 to heads, and 1⁄2 to tails, and can’t meaningfully split the tails case depending on the day. Sleeping beauty is a functional decision theory agent. For each action A, they consider the logical counterfactual that the algorithm they are implementing returned A, then calculate the worlds utility in that counterfactual. They then return whichever action maximizes utility.
In this framework, “which version am I?” is a meaningless question, you are the algorithm. The fact that the algorithm is implemented in a physical substrate give you means to affect the world. Under this model, whether or not your running on multiple redundant substrates is irrelivant. You reason about the universe without making any anthropic updates. As you have no way of affecting a universe that doesn’t contain you, or someone reasoning about what you would do, you might as well behave as if you aren’t in one. You can make the efficiency saving of not bothering to simulate such a world.
You might, or might not have an easier time effecting a world that contains multiple copies of you.
In other words, the agent assigned zero probability to an event, and then it happened.
As far as I understand it, you are proposing that the most realistic failure mode consists of many AI systems, all put into a position of power by humans, and optimizing for their own proxies. Call these Trusted Trial and Error AI’s (TTE)
The distinguishing features of TTE’s are that they were Trusted. A human put them in a position of power. Humans have refined, understood and checked the code enough that they are prepared to put this algorithm in a self driving car, or a stock management system. They are not lab prototypes. They are also Trial and error learners, not one shot learners.
Some More descriptions of what capability range I am considering.
Suppose hypothetically that we had TTE reinforcement learners, a little better than todays state of the art, and nothing beyond that. The AI’s are advanced enough that they can take a mountain of medical data and train themselves to be skilled doctors by trial and error. However they are not advanced enough to figure out how humans work from, say a sequenced genome and nothing more.
Give them control of all the traffic lights in a city, and they will learn how to minimize traffic jams. They will arrange for people to drive in circles rather than stay still, so that they do not count as part of a traffic jam. However they will not do anything outside their preset policy space, like hacking into the traffic light control system of other cities, or destroying the city with nukes.
If such technology is easily available, people will start to use it for things. Some people put it in positions of power, others are more hesitant. As the only way the system can learn to avoid something is through trial and error, the system has to cause a (probably several) public outcrys before it learns not to do so. If no one told the traffic light system that car crashes are bad on simulations or past data, (Alignment failure) Then even if public opinion feeds directly into reward, it will have to cause several car crashes that are clearly its fault before it learns to only cause crashes that can be blamed on someone else. However, deliberately causing crashes will probably get the system shut off or seriously modified.
Note that we are supposing many of these systems existing, so the failures of some, combined with plenty of simulated failures, will give us a good idea of the failure modes.
The space of bad things an AI can get away with is small and highly complex in the space of bad things. An TTE set to reduce crime rates tries making the crime report forms longer, this reduces reported crime, but humans quickly realize what its doing. It would have to do this and be patched many times before it came up with a method that humans wouldn’t notice.
Given Advanced TTE’s as the most advanced form of AI, we might slowly develop a problem, but the deployment of TTE’s would be slowed by the time it takes to gather data and check reliability. Especially given mistrust after several major failures. And I suspect that due to statistical similarity of training and testing, many different systems optimizing different proxies, and humans having the best abstract reasoning about novel situations, and the power to turn the systems off, any discrepancy of goals will be moderately minor. I do not expect such optimization power to be significantly more powerful or less aligned than modern capitalism.
This all assumes that no one will manage to make a linear time AIXI. If such a thing is made, it will break out of any boxes and take over the world. So, we have a social process of adaption to TTE AI, which is already in its early stages with things like self driving cars, and at any time, this process could be rendered irrelevant by the arrival of a super-intelligence.
1)Climate change caused extinction is not on the table. Low tech humans can survive everywhere from the jungle to the arctic. Some humans will survive.
2) I suspect that climate change won’t cause massive social collapse. It might well knock 10% of world GDP, but it won’t stop us having an advanced high tech society. At the moment, its not causing damage on that scale, and I suspect that in a few decades, we will have biotech, renewables or other techs that will make everything fine. I suspect that the damage caused by climate change won’t increase by more than 2 or 3 times in the next 50 years.
3) If you are skilled enough to be a scientist, inventing a solar panel that’s 0.5% more efficient does a lot more good than showing up to protests. Protest’s need many people to work, inventors can change the world by themselves. Policy advisors and academics can suggest action in small groups. Even working a normal job and sending your earnings to a well chosen charity is likely to be more effective.
4) Quite a few people are already working on global warming. It seems unlikely that a problem needs 10,000,001 people working on it to solve, and if only 10,000,000 people work on it, they won’t manage. Most of the really easy work on global warming is already being done. This is not the case with AI risk as of 10 years ago, for example. (It’s got a few more people working on it since then, still nothing like climate change.)
I think the protagonist here should have looked at earth. If there was a technological intelligence on earth that cared about the state of Jupiter’s moons, then it could send rockets there. The most likely scenarios are a disaster bad enough to stop us launching spacecraft, and an AI that only cares about earth.
A super intelligence should assign non negligible probability to the result that actually happened. Given the tech was available, a space-probe containing an uploaded mind is not that unlikely. If such a probe was a real threat to the AI, it would have already blown up all space-probes on the off chance.
The upper bound given on the amount that malicious info can harm you is extremely loose. Malicious info can’t do much harm unless the enemy has a good understanding of the particular system that they are subverting.
Yet policy exploration is an important job. Unless you think that someone posting something on a blog is going to change policy without anyone double-checking it first, we should encourage suggestion of radically new policies.
I would like to propose a model that is more flattering to humans, and more similar to how other parts of human cognition work. When we see a simple textual mistake, like a repeated “the”, we don’t notice it by default. Human minds correct simple errors automatically without consciously noticing that they are doing it. We round to the nearest pattern.
I propose that automatic pattern matching to the closest thing that makes sense is happening at a higher level too. When humans skim semi contradictory text, they produce a more consistent world model that doesn’t quite match up with what is said.
Language feeds into a deeper, sensible world model module within the human brain and GPT2 doesn’t really have a coherent world model.
As your belief about how well AGI is likely to go affects both the likelihood of a bet being evaluated, and the chance of winning, so bets about AGI are likely to give dubious results. I also have substantial uncertainty about the value of money in a post singularity world. Most obviously is everyone getting turned into paperclips, noone has any use for money. If we get a friendly singleton super-intelligence, everyone is living in paradise, whether or not they had money before. If we get an economic singularity, where libertarian ASI(s) try to make money without cheating, then money could be valuable. I’m not sure how we would get that, as an understanding of the control problem good enough to not wipe out humans and fill the universe with bank notes should be enough to make something closer to friendly.
Even if we do get some kind of ascendant economy, given the amount of resources in the solar system (let alone wider universe), its quite possible that pocket change would be enough to live for aeons of luxury.
Given how unclear it is about whether or not the bet will get paid and how much the cash would be worth if it was, I doubt that the betting will produce good info. If everyone thinks that money is more likely than not to be useless to them after ASI, then almost no one will be prepared to lock their capital up until then in a bet.
I suspect that an AGI with such a design could be much safer, if it was hardcoded to believe that time travel and hyperexponentially vast universes were impossible. Suppose that the AGI thought that there was a 0.0001% chance that it could use a galaxies worth of resources to send 10^30 paperclips back in time. Or create a parallel universe containing 3^^^3 paperclips. It will still chase those options.
If starting a long plan to take over the world costs it literally nothing, it will do it anyway. A sequence of short term plans, each designed to make as many paperclips as possible within the next few minutes could still end up dangerous. If the number of paperclips at time t is ct, and its power at time t is pt, then pt+1=2pt, ct=pt would mean that both power and paperclips grew exponentially. This is what would happen if power can be used to gain power and clips at the same time, with minimal loss of either from also pursuing the other.
If power can only be used to gain one thing at a time, and the rate power can grow at is less than the rate of time discount, then we are safer.
This proposal has several ways to be caught out, world wrecking assumptions that aren’t certain, but if used with care, a short time frame, an ontology that considers timetravel impossible, and say a utility function that maxes out at 10 clips, it probably won’t destroy the world. Throw in mild optimization and an impact penalty, and you have a system that relies on a disjunction of shaky assumptions, not a conjunction of them.
It is a CDT agent, or something that doesn’t try to punish you now so you make paperclips last week. A TDT agent might decide to take the policy of killing anyone who didn’t make clips before it was turned on, causing humans that predict this to make clips.
I suspect that it would be possible to build such an agent, prove that there are no weird failure modes left, and turn it on, with a small chance of destroying the world. I’m not sure why you would do that. Once you understand the system well enough to say its safe-ish, what vital info do yo gain from turning it on?
Butterfly effects essentially unpredictable, given your partial knowledge of the world. Sure, you doing homework could cause a tornado in Texas, but it’s equally likely to prevent that. To actually predict which, you would have to calculate the movement of every gust of air around the world. Otherwise your shuffling an already well shuffled pack of cards. Bear in mind that you have no reason to distinguish the particular action of “doing homework” from a vast set of other actions. If you really did know what actions would stop the Texas tornado, they might well look like random thrashing.
What you can calculate is the reliable effects of doing your homework. So, given bounded rationality, you are probably best to base your decisions on those. The fact that this only involves homework might suggest that you have an internal conflict between a part of yourself that thinks about careers, and a short term procrastinator.
Most people who aren’t particularly ethical still do more good than harm. (If everyone looks out for themselves, everyone has someone to look out for them. The law stops most of the bad mutual defections in prisoners dilemmas) Evil genius trying to trick you into doing harm are much rarer than moderately competent nice people trying to get your help to do good.
This is an example of a pascals mugging. Tiny probabilities of vast rewards can produce weird behavior. The best known solution is either a bounded utility function, or a antipascalene agent. (An agent that ignores the best x% and worst y% of possible worlds when calculating expected utilities. It can be money pumped)