The niave protocol, any time you get a note, you come up with a random password. That password has to be on the bottom of the note. If the password has even a few bits of entropy, relative to outsiders, this will work. (Memory charms or time turners can get around this, but its still a good precaution)
Preparing anyway even if it’s very low probability because of extreme consequences is Pascal’s Wager
ASI is probably coming sooner or later. Someone has to prepare at some point, the question is when.
I consider AI development to be a field that I have little definite info about. Its hard to assign less than 1% prob to statements about ASI. (Excepting the highly conjoined ones.) I don’t consider things like dinosaur killing asteroids with 1 in 100 million probs to be pascal muggings.
If and when a canary “collapses” we will have ample time to design off switches and identify red lines we don’t want AI to cross
We have a tricky task, and we don’t know how long it will take. Having hit one of these switches doesn’t help us to do the task much. A student is given an assignment in August, the due date is March next year. They decide to put it off until it snows. Snowfall is an indicator that the due date is coming soon, but not a good one. But either way, it doesn’t help you do the assignment.
What is a “fully self driving” car, we have had algorithms that kind of usually work for years, and a substantial part of modern progress in the field looks like gathering more data, and developing driving specific tricks. Suppose that you needed 100 million hours of driving data to train current AI systems. A company pays drivers to put a little recording box in their car. It will take 5 years to gather enough data, and after that we will have self driving cars. What are you going to do in 5 years time that you can’t do now. In reality, we aren’t sure if you need 50, 100 or 500 million hours of driving data with current algorithms, and aren’t sure how many people will want the boxes installed. (These boxes are usually built into satnavs or lane control systems in modern cars)
Limited versions of the Turing test (like Winograd Schemas)
What percentage do you want, and what will you do when gpt-5 hits it?
We are decades away from the versatile abilities of a 5 year old
A “result” got by focussing on the things that 5 year olds are good at,
Sometimes you have a problem like looking at an image of some everyday scene and saying whats happening in it that 5 year olds are (or at least were a few years ago) much better at. Looking at a load of stock data and using linear regression to find correlations between prices, nothing like that existed in the environment of evolutionary adaptedness, human brains aren’t built to do that.
Even if that was true, how would you know that? Technological progress is hard to predict. Designing off switches is utterly trivial if the system isn’t trying to avoid the off switch being pressed, and actually quite hard if the AI is smart enough to know about the off switch and remove it.
I agree that getting chinese whispered into nonsense is a real failure mode, pop sci quantum mechanics shows this well enough. I think it unlikely that there was that much of a real, true and important point behind most myths. Repeatedly rerecord a sound on analogue media enough and you get static. Repeatedly retell a story and you get mythology. We can’t tell much about what the signal started as, but I doubt it was all brilliant rationality, because most of our really old documents aren’t that rational.
Suppose the AI finds a plan with 10^50 impact and 10^1000 utility. I don’t want that plan to be run. Its probably a plan that involves taking over the universe and then doing something really high utility. I think a constraint is better than a scaling factor.
I am not convinced that the distinction between continuous and discontinuous approaches is a feature of the territory. Zoom in in sufficient detail, and you see a continuous wavefunction of electron interacting with the continuous wavefunction of a silicon atom. Zoom out to evolutionary timescales, and the jump from hominids with pointy sticks to ASI is almost instant. The mathematical definition of continuity relies on your function being mathematically formal. Is distance from earth to the moon an even number of plank-lengths? Well there are a huge number of slightly different measurements you could make, depending on when you measure and exactly what points you measure between, and how you deal with relitivistic length contraction, the answer will be different. In a microsecond in which the code that is a fooming AI is doing garbage collection, is AI progress happening? You have identified an empirical variable called AI progress, but whether or not it is continuous depends on exactly how you fill in the details.
Imagine superintelligence has happened and we are discussing the details afterwords. We were in a world basically like this one, and then someone ran some code, which gained total cosmic power within a millisecond. Someone tries to argue that this was a continuous improvement, just a very fast one. What evidence would convince you one way or the other on this?
I would bet that all the data needed in principle to, say, find a cure for Alzheimers is already available online—if only we knew how to effectively leverage it.
I agree. If “effectively leverage it” means a superintelligence with unlimited compute, then this is a somewhat weak statement. I would expect that a superintelligence given the human genome would figure out how to cure all diseases. I would expect it to be able to figure out a lot from any book on biology. I would expect it to be able to look at a few holliday photos, figure out the fundamental equations of reality, and that evolution happened on a planet ,that it was created by evolved intelligences with tech ect. From this, it could design nanobots programmed to find humans and cure them, even if it had no idea what humans look like, it just programs the nanobots to find the most intelligent life forms around.
Suppose that different tasks take different levels of AI to do better than humans.
Firstly AI can do arithmatic, then play chess, then drive cars ect. Lets also assume that AI is much faster than humans. So imagine that AI research ability was rising from almost nothing to superhuman over the course of a year. A few months in and its inventing stuff like linear regression, impressive, but not as good as current human work on AI. There are a few months where the AI is worse than a serious team of top researchers, but better than an intern. So if you have a nieche use for AI, that can be automatically automated. The AI research AI designs a widget building AI. The humans could have made a widget building AI themselves, but so few widgets are produced, that it wasn’t worth it.
Then the AI becomes as good as a top human research team and FOOM. How crazy the world gets before foom depends on how much other stuff is automated first. Is it easier to make an AI teacher, or an AI AI researcher? Also remember that bearocratic delays are a thing, there is a difference between having an AI that does medical diagnosis in a lab, and it being used in every hospital.
I had a dream where I was flying by incrementing my own x and y coordiates. Somewhat related to simulated worlds, but also to straight programming.
If I am confidant that I have the original source code, as written by humans, I read that. I am looking for deep abstract principles. I am looking only for abstract ideas that are general to the field of AI.
If I can encrypt the code in a way that only a future superintelligence can crack, and I feel hopeful about FAI, I do that. Otherwise, secure erase, possibly involving anything that can slag the hard drives that is lying around.
You might be producing some useful info, but mostly about whether an arbitrary system exibits unlimited exponential growth. If you got 1000 different programmers to each throw together some model of tech progress, some based on completing tasks, some based on extracting resources, some based on random differential equations ect, and see what proportion of them give exponential growth and then stagnation. Actually, there isn’t a scale on your model, so who can say if the running out of tasks, or stagnation are next year or in 100000 years. At best, you will be able to tell how strongly outside view priors should favor exp growth over growth and then decay. (Pure growth is clearly simpler, but how much simpler?)
A lot of your argument seems to be comparing an artifact of human technology with an evolved system. “is there a way to destroy the moon, given only the ability to post 10k characters to lesswrong.com?”
To make the discussion clearer, lets pick a particular evolved system and technology, say an aeroplane wing and a insect wing. Suppose that the aeroplane wing wins on some criteria, like speed, the insect wing wins on efficiency and it all balances out overall.
To say therefor that intelligence isn’t that great is a mixing of levels. There are two intelligences in the game, humans and evolution. Both have produced a great variety of highly optimized artifacts. Both are of roughly comparable power. By comparing two aeroplanes, you can also compare the skill of the designers, but it is meaningless to try to compare an aeroplane to an aeroplane designer. The insect is the plane, not the designer.
Some of your comparisons make even less sense, like ability to survive in extreme environments. Comparing a fish and an untooled human in ability to survive in the ocean is a straight contest of fish evolution vs human evolution. If the human drowns before they have a chance to think anything, the power of the human brain is not shown in the slightest.
Also comparing human intelligence between humans is like comparing the running speed of cheetahs, all your results will be similar. So one human beating another tells you little about intelligence.
So what would a real comparison of intelligence with something else look like? I think the question “Is intelligence good?” is not that meaningful.
What we can do is ask “is there a way to X given only Y” For instance “is there a way to make a fire, given only the ability to contract mucles of a human body in a forest?” or “is there a way to destroy the moon, given only the ability to post 10k charicters to lesswrong.com?” These are totally formalizable questions and could in principle be answered by simulating an exponential number of universes.
We can also ask questions about which algorithms will actually find a way to achieve a goal. We know that there exists a pattern of electrical inputs that win the game pong, but want to know if some gradient descent based algorithm will find one.
We can then say there are a wide variety of tasks and goals that humans can fulfill given our primitive action of muscle contraction. Given that chimps have a similar musculature, but less intelligence and can’t do most of these tasks, and many of the routes to fulfillment of the goals go through layers of indirection, then it seems that an intelligence comparable to humans with some other output channel would be similarly good at achieving goals.
How dangerous would you consider a person with basic programming skills and a hypercomputer? I mean I could make something very dangerous, given hypercompute. I’m not sure if I could make much that was safe and still useful. How common would it be to accidentally evolve a race of aliens in the garbage collection?
At the moment, my best guess at what powerful algorithms look like is something that lets you maximize functions without searching through all the inputs. Gradient descent can often find a high point without that much compute, so is more powerful than random search. If your powerful algorithm is more like really good computationally bounded optimization, I suspect it will be about as manipulative as brute forcing the search space. (I see no strong reason for strategies labeled manipulative to be that much easier or harder to find than those that aren’t.)
Suppose an early AI is trying to understand its programmers and makes millions of hypothesis that are themselves people. Later it becomes a friendly superintelligence that figures out how to think without mindcrime. Suppose all those imperfect virtual programmers have been saved to disk by the early AI, the superintelligence can look through it. We end up with a post singularity utopia that contains millions of citizens almost but not quite like the programmers. We don’t need to solve the nonperson predicate ourselves to get a good outcome, just avoid minds we would regret creating.
Quote from wikipedia on fukushima
Deaths 1 cancer death attributed to radiation exposure by government panel.
Non-fatal injuries 16 with physical injuries due to hydrogen explosions, 2 workers taken to hospital with possible radiation burns
I think this puts the incident squarely in the class of minor accidents that the media had a panic about. Unless you think it had a 50 % chance of wiping out japan and we were just lucky, it is irrelevant to the discussion of X-risk.
With CO2, it depends what you mean by buisness as usual. We don’t have 500 years of fossil fuels left, and we are already switching to renewables. I don’t think that the earth will become uninhabitable to technologically advanced human life. A In a scenario where humans are using air conditioners and desalinators to survive the 80C Norwegian deserts, the world is still “habitable”. (I don’t think it will get that bad, but I think humans would survive if it did. )
If the time it takes for a black ball to kill us is more than a few generations it’s really hard to plan around fixing it.
No those are the ones that are really easy to plan around, you have plenty of time to fix them. Its the ones that kill you instantly that are hard to plan around.
Consider these 5 states
3) Tech progress fails. No one is doing tech research.
4) We coordinate to avoid UFAI, and don’t know how to make FAI.
5) No coordination to avoid UFAI, no one has made one yet. (State we are currently in)
In the first 3 scenarios, humanity won’t be wiped out by some other tech. If we can coordinate around AI, I would suspect that we would manage to coordinate around other black balls. (AI tech seems unusually hard to coordinate around, as we don’t know where the dangerous regions are, tech near the dangerous regions is likely to be very profitable, it is an object entirely of information, thus easily copied and hidden. ) In state 5, it is possible for some other black ball to wipe out humanity.
So conditional on some black ball tech other than UFAI wiping out humanity, the most likely scenario is that it came sooner than UFAI could. I would be surprised if humanity stayed in state 5 for the next 100 years. (I would be most worried about grey goo here)
The other thread of possibility is that humanity coordinates around stopping UFAI being developed, and then gets wiped out by something else. This requires an impressive amount of coordination. It also requires that FAI isn’t developed (or is stopped by the coordination to avoid UFAI) Given this happens, I would expect that humans had got better at coordinating, that people who cared about X-risk were in positions of power, and that standards and presidents had been set. Anything that wipes out a humanity that well coordinated would have to be really hard to coordinate around.
I think that there is a scale from the totally specific algorithms to totally general ones.
People will have to do a lot of maths and philosophy to get an AI system that works at all.
Suppose you have a lead of 1 week over any ufai projects, and you have your AI system to the point where it can predict what you would do in a box. (Actually, we can say the AI has developed mind uploading tech + lotsa compute) The human team needs say 5 years of thinking to come up with better metaethics, defense against value drift or whatever. You want to simulate the humans in some reasonably human friendly environment for a few years to work this thing out. You pick a nice town, and ask the AI to create a virtual copy of the town. (More specifically, you randomly sample from the AI’s probability distribution, after conditioning on enough data that the town will be townlike.) The virtual town is created with no people except the research team in it. All the services are set to work without any maintenance. (Water in virtual pipes, food in virtual shops, virtual internet works.). The team of people uploaded into this town is at least 30, ideally a few hundred, including plenty of friends and family.
This “virtual me in a box” seems likely to be useful and unlikely to be dangerous. I agree that any virtual box trick that involves people thinking for a long time compared to current lifespans is dangerous. A single person trapped in low res polygon land would likely go crazy from the sensory deprivation.
You need an environment with a realistic level of socializing and leisure activities to support psycologically healthy humans. Any well done “virtual me in a box” is going to look more like a virtual AI safety camp or research department than 1 person in a blank white room containing only a keyboard.
Unfortunately, all those details would be hard to manually hard code in. You seem to need an AI that can be trusted to follow reasonably clear and specific goals without adversarial optimization. You want a virtual park, manually creating it would be a lot of hard work, see current video games. You need an AI that can fill in thousands of little details in a manor not optimized to mess with humans. This is not an especially high bar.
The algorithms that are used nowadays are basically the same as the algorithms that were known then, just with a bunch of tricks like dropout.
Suppose that you have 100 ideas that seem like they might work. You test them, and one of them does work. You then find a mathematical reason why it works. Is this insight of compute?
Even if most of the improvement is in compute, there could be much better algorithms that we just aren’t finding. I would be unsurprised if there exists an algorithm that would be really scary on vacuum tubes.
Different minds use different criteria to evaluate an argument. Suppose that half the population were perfect rationalists, whose criteria for judging an argument depended only on Occam’s razor and Bayesian updates. The other half are hard-coded biblical literalists, who only believe statements based on religious authority. So half the population will consider “Here are the short equations, showing that this concept has low Komelgorov complexity” to be a valid argument, the other half consider, “Pope Clement said …” to be a strong argument.
Suppose that any position that has strong religious and strong rationalist arguments for it is so obvious that no one is doubting or discussing it. Then most propositions believed by half the population have strong rationalist support, or strong religious support, but not both. If you are a rationalist and see one fairly good rationalist argument for X, you search for more info about X. Any religious arguments get dismissed as nonsense.
The end result is that the rationalists are having a serious discussion about AI risk among themselves. The religous dismiss AI as ludicrous based on some bible verse.
The religious people are having a serious discussion about the second coming of Christ and judgement day, which the rationalists dismiss as ludicrous.
The end result is a society where most of the people who have read much about AI risk think its a thing, and most of the people who have read much about judgement day think its a thing.
If you took some person from one side and forced them to read all the arguments on the other, they still wouldn’t believe. Each side has the good arguments under their criteria of what a good argument is.
The rationalists say that the religious have poor epistemic luck, there is nothing we can do to help them now, when super-intelligence comes it can rewire their brains. The religious say that the rationalists are cursed by the devil, when judgement day comes, they will be converted by the glory of god.
The rationalists are designing a super-intelligence, the religious are praying for judgement day.
Bad ideas and good ones can have similar social dynamics because most of the social dynamics around an idea depends on human nature.