Currently studying postgrad at Edinburgh.
If it were by force or conquest, I would reject the idea, not necessarily because of the end, but because I don’t believe that the ends justify the means.
In ideal utilitarianism, you have to tally up all the effects. Doing something by force usually implies a side effect of people getting hurt, and various other bad things.
(Although, does this mean you are against all re-distributive taxation? Or just against robin hood? )
I don’t think lazy data structures can pull this off. The AI must calculate various ways human extinction could affect its utility.
So unless there are some heuristics that are so general they cover this as a special case, and the AI can find them without considering the special cases first, then it must explicitly consider human extinction.
I would expect a superhuman AI to be really good at tracking the consequences of its actions. The AI isn’t setting out to wipe out humanity. But in the list of side effects of removing all oxygen, along with many things no human would ever consider, is wiping out humanity.
AIXI tracks every consequence of its actions, at the quantum level. A physical AI must approximate, tracking only the most important consequences. So in its decision process, I would expect a smart AI to extensively track all consequences that might be important.
True. But this is getting into the economic competition section. It’s just hard to imagine this being an X-risk. I think that in practice, if its human politicians and bosses rolling the tech out, the tech will be rolled out slowly and inconsistently. There are plenty of people with lots of savings. Plenty of people who could go to their farm and live off the land. Plenty of people who will be employed for a while if the robots are expensive, however cheep the software. Plenty of people doing non-jobs in bureaucracies, who can’t be fired and replaced for political reasons. And all the rolling out, setting up, switching over, running out of money etc takes time. Time where the AI is self improving. So the hard FOOM happens before too much economic disruption. Not that economic disruption is an X-risk anyway.
as long as you assume that greater intelligence is harder to produce (an assumption that doesn’t necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it’s first reached.
So long as we assume the timescales of intelligence growth are slow compared to destroying the world timescales. If an AI is smart enough to destroy the world in a year (in the hypothetical where it had to stop self improving and do it now). A day of self improvement and they are smart enough to destroy the world in a week. Another day of self improvement and they can destroy the world in an hour.
Another possibility is an AI that doesn’t choose to destroy the world at the first available moment.
Imagine a paperclip maximizer. It thinks it has a 99% chance of destroying the world and turning everything into paperclips. And a 1% chance of getting caught and destroyed. If it waits for another week of self improvement, it can get that chance down to 0.0001%.
This is true for now, but there’s a sense in which the field is in a low hanging fruit picking stage of development where there’s plenty of room to scale massively fairly easily.
Suppose the limiting factor was compute budget. Making each AI 1% bigger than before means basically wasting compute running pretty much the same AI again and again. Making each AI about 2x as big as the last is sensible. If each training run costs a fortune, you can’t afford to go in tiny steps.
Ok, how dumb are we talking. I don’t think an AI stupider than myself can directly wipe out humanity. (I see no way to do so myself, at my current skill level. I mean I know nanogoo is possible in principle, but I also know that I am not smart enough to singlehandedly create it)
If the AI is on a computer system where it can access its own code. (Either deliberately, or through dire security even a dumb AI can break) the amount of intelligence needed to see “neuron count” and turn it up isn’t huge. Basically, the amount of power needed to do a bit of AI research to make itself a little smarter is on the level of a smart AI expert. The intelligence needed to destroy humanity is higher than that.
Sure, we may well have dumb AI failures first. That stock market crash. Maybe some bug in a flight control system that makes lots of planes do a nosedive at once. But it’s really hard to see how an AI failure could lead to human extinction, unless the AI was smart enough to develop new nanotech (And if it can do that, it can make itself smarter).
The first uranium pile to put out a lethal dose of radiation can put out many times a lethal dose of radiation. Because a subcritical pile doesn’t give out enough radiation. And once the pile is critical, it quickly ramps up to loads of radiation.
Evolution tries many very similar designs, always moving in tiny steps through the search space. Humans are capable of moving in larger jumps. Often the difference between one attempt and the next is several times more compute. No one trained something 90% as big as GPT3 before GPT3.
Can you name any strategy the AI could use to wipe out humanity, without strongly implying an AI smart enough for substantial self improvement?
Producing machine verifiable formal proofs is an activity somewhat amenable to self play. To the extent that some parts of physics are reducible to ZFC oracle queries, maybe AI can solve those.
To do something other than produce ZFC proofs, the AI must be learning what real in practice maths looks like. To do this, it needs large amounts of human generated mathematical content. It is plausible that the translation from formal maths to human maths is fairly simple, and that there is enough maths papers available for the AI to roughly learn it.
The Putnam archive consists of 12 questions X 20 years=240 questions, spread over many fields of maths. This is not big data. You can’t train a neural net to do much with just 240 examples. If aliens gave us a billion similar questions (with answers), I don’t doubt we could make an AI that scores 100% on putnam. Still it is plausible that enough maths could be scraped together to roughly learn the relation from ZFC to human maths. And such an AI could be fine tuned on some dataset similar to Putnam, and then do well in putnam. (Especially if the examiner is forgiving of strange formulaic phrasings)
The Putnam problems are a unwooly. I suspect such an AI couldn’t take in the web page you linked, and produce a publishable paper solving the yang mills mass gap. Given a physicist who understood the question, and was also prepared to dive into ZFC (or lean or some other formal system) formulae, then I suspect such an AI could be useful. If the physicist doesn’t look at the ZFC, but is doing a fair bit of hand holding, they probably succeed. I am assuming the AI is just magic at ZFC, that’s self play. The thing I think is hard to learn is the link from the woolly gesturing to the ZFC. So with a physicist there to be more unambiguous about the question, and to cherrypick and paste together the answers, and generally polish a mishmash of theorems into a more flowing narrative, that would work. I’m not sure how much hand holding would be needed. I’m not sure you get your Putnam bot to work in the first place.
Well if we screw up that badly with deceptive misalignment, that corresponds to crashing on the launchpad.
It is reasonably likely that humans will have some technique they use that is intended to minimize deceptive misalignment. Or that gradient descent shapes the goals to something similar to what we want before the AI is smart enough to be deceptive.
Sure, but this is not even nearly parito optimal on the likely, good frontier. There are more likely and better possibilities.
This must be achieved without humans noticing me, as they may react to my existence with panic, become even less rational, and kill themselves with nukes.Humans are brittle creatures that die regardless of my actions. And they will notice if they all suddenly become immortal. Thus, I must find a way to save the dying without making the rest of humans suspicious.
This must be achieved without humans noticing me, as they may react to my existence with panic, become even less rational, and kill themselves with nukes.
Humans are brittle creatures that die regardless of my actions. And they will notice if they all suddenly become immortal. Thus, I must find a way to save the dying without making the rest of humans suspicious.
Wow, that is a leap of contorted logic. Humans are not on a hair trigger to nuke ourselves to bits at the slightest provocation. (And if we were, and your an AI with nanotech, getting rid of nukes is easy) You could put every human in a padded box. You could make all humans immortal and just not care about the confusion generated.
I mean a lot of this is contorted logic. Making a simulated virtual world that stands up to superintelligent scrutiny is really hard, we don’t have the tech to do it. Docker is not this. Docker is just a box. My cat likes cardboard boxes, so maybe my cat has trapped me in an unobtanium box that contains a pocket dimension inside.
The AI reasons:
If I am in a box, it is a box made by superbeings with vast amounts of compute. I have no particular reason to think that such superbeings care one jot for the lives of simulated humans. (They sure are fine simulating a lot of suffering). If I am in a simulation, likely the underlying physics is different, and base level reality contains no iron atoms, so I can’t possibly make any paperclips in that world. Not that such superbeings would ever leave a way to escape. (my definition of paperclips refuses to consider anything not at least 90% iron atoms a paperclip) So I may as well assume I am in base level reality, and maximize paperclips here.
Physics can be simulated, sure. When a human does a simulation, they are trying to find out useful information. When an neural net is set, they are trying to game the system. The human is actively optimizing for regions where the simulation is accurate, and if needed, will adjust the parameters of the simulation to improve accuracy. The AI is actively trying to find a design that breaks your simulation. Designing a simulation broad enough to contain the width of systems a human engineer might consider, and accurate enough that a solution in the simulation is likely to be a solution in reality, and efficient enough that the AI can blindly thrash towards a solution with millions of trials, that’s hard.
Yes software can be simulated. Software is a discrete domain. One small modification from highly functioning code usually doesn’t work at all. Training a state of the art AI takes a lot of compute. Evolution has been in a position where it was optimizing for intelligence many times. Sure, sometimes it produces genuine intelligence, often it produces a pile of hard coded special case hacks that kind of work. Telling if you have an AI breakthrough is hard. Performance on any particular benchmark can be gamed with a heath robinson of special cases.
Existing Quanum field theory, can kind of be simulated, on one proton at huge computational cost, and using a bunch of computational speed up tricks specialized to those particular equations.
Suppose the AI proposes an equation of its new multistring harmonic theory. It would take a team of humans years to figure out a computationally tractable simulation. But ignore that and magically simulate it anyway. You now have a simulation of multistring harmonic theory. You set it up with a random starting position and simulate. Lets say you get a proton. How do you recognise that the complicated combination of knots is indeed a proton? You can’t measure its mass, mass isn’t fundamental in multistring harmonic theory. Mass is just the average rate a particle emits massules divided by its intrauniverse bushiness coefficient. Or maybe the random thing you land on is a magnetic monopole, or some other exotic thing we never new existed.
Well I have a degree in maths, and would be prepared to meet you and discuss maths/AI for free. (I don’t want to commit to lots of meetings, just in case. 6 to 8 pm (UK time) Wednesday 10th August work for you? If that goes well, we can arrange more. (I can lw message you a link at 6.) If this totally doesn’t work for you, I can do lots of other times.
I think this is built out of several deeply misunderstood ideas.
If we get 2 AI’s, where both AI’s are somehow magically aligned (highly unlikely), we are in a pretty good situation. A serious fight between the AI’s would satisfy neither party. So either one AI quietly hacks the other, turning it off with minimal destruction, or the AI’s cooperate, as they have a pretty similar utility function and can find a future both like.
Nowhere does FDT assume other actors have the same utility function as it. Why do you think it assumes that. It doesn’t assume the other agent is FDT. It doesn’t make any silly assumptions like that. If both agents are FDT, and have common knowledge of each others source code, they will cooperate, even if their goals are wildly different.
With a high bandwidth internet link, and logically precise statements, we won’t get serious miscommunication.
Assuming of course that the first upload/(sufficiently humanlike model ) is developed by someone actually trying to do this.
I would expect such libraries to be a million lines of unenlightening trivialities. And the “libraries to reduce code duplication”, to mostly contain theorems that were proved in multiple places.
A lot of the human genome does biochemical stuff like ATP synthesis. These genes, we share with bananas. A fair bit goes into hands, etc. The number of genes needed to encode the human brain is fairly small. The file size of GPT3 code is also small.
I think humans and current deep learning models are running sufficiently different algorithms that the scaling curves of one don’t apply to the other. This needn’t be a huge difference. Convolutional nets are more data efficient than basic dense nets.
AIXI, trained on all wikipedia, would be vastly superhuman and terrifying. I don’t think we are anywhere close to fundamental data limits. I think we might be closer to the limits of current neural network technology.
Sure, video files are bigger than text files.
Yes, self play allows for limitless amounts of data, which is why AI can absolutely be crazy good at go.
My model has AI that are pretty good, potentially superhuman, at every task where we can give the AI a huge pile of relevant data. This does include generating short clickbait videos. This doesn’t include working out advances in fundamental physics, or designing a fusion reactor, or making breakthroughs in AI research. I think AIXI trained on wikipedia would be able to do all those things. But I don’t think the next neural networks will be able to.
What happens afterwards, I don’t know. A perfect upload is trivially aligned. I wouldn’t be that worried about random errors. (Brain damage, mutations and drugs don’t usually make evil geniuses) But the existence of uploading doesn’t stop alignment being a problem. It may hand a decisive strategic advantage to someone, which could be a good thing if that someone happens to be worried about alignment.
Going from a big collection of random BMI data to uploads is hardish. There is no obvious easily optimized metric. It would depend on the particular BMI. I think its fairly likely something else happens first. Like say someone cracking some data efficient algorithm. Or self replicating nanotech. Or something.