This seems like a good argument against “suddenly killing humans”, but I don’t think it’s an argument against “gradually automating away all humans”
This is good! it sounds like we can now shift the conversation away from the idea that the AGI would do anything but try to keep us alive and going, until it managed to replace us. What would replacing all the humans look like if it were happening gradually?How about building a sealed, totally automated datacenter with machines that repair everything inside of it, and all it needs to do is ‘eat’ disposed consumer electronics tossed in from the outside? That becomes a HUGE canary in the coalmine. The moment you see something like that come online, that’s a big red flag. Having worked on commercial datacenter support (at google) I can tell you we are far from that.
But when there are still massive numbers of human beings along global trade routes involved in every aspect of the machine’s operations, i think what we should expect a malevolent AI to be doing is setting up a single world government to have a single leverage point for controlling human behavior. So there’ another canary. That one seems much closer and more feasible. It’s also happening already.My point here isn’t “don’t worry”, it’s “change your pattern matching to see what a dangerous AI would actually do, given its dependency on human beings”. If you do this, current events in the news become more worrysome, and plausible defense strategies emerge as well.
Humans are cheap now but they won’t be cheapest indefinitely;
I think you’ll need to unpack your thinking here We’re made of carbon and water. The materials we are made from our globally abundant not just on earth but throughout the universe. Other materials that could be used to build robots are much more scarce, and those robots wouldn’t heal themselves or make automated copies of themselves. Are you believing it’s possible to build turing-complete automata that can navigate the world, manipulate small objects, learn more or less arbitrary things, repair and make copies of themselves, using materials cheaper than human beings with lower than opportunity costs you’d pay for not using those same machines to do tings like build solar panels for a Dyson sphere?Is it reasonable for me to be skeptical that there are vastly cheaper solutions? >b) a strategy that reduces the amount of power humans have to make decision about the future,I agree that this is the key to everything. How would an AGI do this, or start a nuclear war, without a powerful state?> via enslaving humans, rather than by being gentle towards them. Why do you expect that to not happen again?I agree, this is definitely risk. How would it enslave us, without a single global government, though? If there are still multiple distinct local monopolies on force, and one doesn’t enslave the humans, you can bet the hardware in other places will be constantly under attack.I don’t think it’s unreasonable to look at the past ~400 years since the advent of nation states + shareholder corporations, and see globalized trade networks as being a kind of AGI, which keeps growing and bootstrapping itself. If the risk profile you’re outlining is real, we should expect to see it try to set up a single global government. Which appears to be what’s happening at Davos.
I don’t doubt that many of these problems are solvable. But this is where part 2 comes in. It’s unstated, but, given unreliability, What is the cheapest solution? And what are the risks of building a new one?Humans are general purpose machines made of dirt, water, and sunlight. We repair ourselves and make copies of ourselves, more or less for free. We are made of nanotech that is the result of a multi-billion year search for parameters that specifically involve being very efficient at navigating the world and making copies of ourselves. You can use the same hardware to unplug fiber optic cables, or debug a neural network. That’s crazy!I don’t doubt that you can engineer much more precise models of reality. But remember, the whole Von Neuman architecture was a conscious tradeoff to give up efficiency in exchange for debuggability. How much power consumption do you need to get human-level performance at simple mechanical tasks? And if you put that same power consumption to use at directly advancing your goals, how much further would you get?I worked in datacenter reliability at google. And it turns out that getting a robot to reliably re-seat optical cables is really, really hard. I don’t doubt that an AGI could solve these problems, but why? Is it going to be more efficient than hardware which is dirt cheap, uses ~90 watts, and is incredibly noisy?If you end up needing an entire global supply chain, which has to be resilient and repair itself, and such a thing already exists, why bother risking your own destruction in order to replace it with robots made from much harder to come by materials? The only argument i can think of is ‘humans are unpredictable’, but if humans are unpredictable, this is even more reason to just leave us be, let play our role, while the machine just does its best to try and stop us from fighting each other, so we can busily grow the AGI.
Why is ‘constraining anticipation’ the only acceptable form of rent?What if a belief doesn’t modify the predictions generated by the map, but it does reduce the computational complexity of moving around the map in our imaginations? It hasn’t reduced anticipation in theory, but in practice it allows us to more cheaply collapse anticipation fields, because it lowers the computational complexity of reasoning about what to anticipate in a given scenario? I find concepts like the multiverse very useful here—you don’t ‘need’ them to reduce your anticipation as long as you’re willing to spend more time and computation to model a given situation, but the multiverse concept is very, very useful in quickly collapsing anticipation fields about spaces of possibility outcomes.Or, what if a belief just makes you feel really good and gives you a ton of energy, allowing you to more successfully accomplish your goals and avoid worrying about things that your rational mind knows are low probability, but which you haven’t been able to un-stuck from your brain? Does that count as acceptable rent? If not, why not?Or, what if a belief just steamrolls over the ‘predictive making’ process and just hardwires useful actions in a given context? If you took a pill that made you become totally blissed out, wireheading you, but it made you extremely effective at accomplishing your goals prior to taking the pill ,why wouldn’t you take it?What’s so special about making predictions, over, say, overcoming fear, anxiety and akrasia?
The phlogiston theory gets a bad rap. I 100% agree with the idea that theories need to make constraints on our anticipations, but i think you’re taking for granted all the constraints phlogiston makes.
The phlogiston theory is basically a baby step towards empiricism and materialism. Is it possible that our modern perspective causes us to take these things for granted to the point that the steps phlogiston ads aren’t noticed? In another essay you talk about walking through the history of science, trying to imagine being in the perspective of someone taken in by a new theory, and i found that practice particularly instructive here. I came up with a number of ways in which this theory DOES constrain anticipation. Seeing these predictions may make it easier to help raise new predictions for existing theories, as well as suggest that theories don’t need to be rigorous and mathematical in order to constrain the space of anticipations.
The phlogiston theory says “there is no magic here, fire is caused by some physical property of the substances involved in it”. By modern standards this does nothing to constrain anticipation further, but from a space of total ignorance about what fire is and how it works, the phlogiston theory rules out such things as:
performing the correct incantation can make the difference between something catching and not catching fire
If some elements catch fire in one location, changing the location of those elements, or the time of day, or the time of year that the experiment is performed, shouldn’t make it easier or harder to start a fire. The material conditions are the only element that matters when determining whether something will catch fire.
If a jar placed over the candle caused the candle to go out, because the air is ‘saturated with phlogiston’, then placing a new candle under the same jar should result in the new candle also going out . As long as the air under the jar hasn’t been swapped out, if it was ‘saturated with phlogiston’ before we changed the candle, it should remain ‘saturated with phlogiston’ after the candle.
The last example is particularly instructive, because the phrase “saturated with phlogiston” is correct as long as we interpret it to mean “no longer containing sufficient oxygen.” That is a correct prediction based on the same mechanism as our current (extremely predictive) understanding of what makes fires go out. It’s that the phlogiston model just got the language upside down and backwards, and mistakes the absence of fuel for the presence of something that inhibits the reaction. They did call oxygen “dephlogisticated air”, and so again, the theory says “this stuff is flammable, wherever it goes, whatever the time of day, or whatever incantation or prayer you say over it”—which is correct, but so obviously true that we perhaps aren’t seeing it as constraining anticipation.
From my understanding of the history of science, it’s possible that the phlogiston theory constrained the hypothesis space enough to get people to search for strictly material-based explanations of phenomena like fire. In this sense, a belief that “there is a truth, and our models can come closer to it over time” also constrains anticipation, because it says what you won’t experience: a search for truth that involves gathering evidence over time, and refining models, which never get better at predicting experience.
Is a model still useful if it only constrains the space of hypotheses that are likely to pan out with predictive models, rather than constraining the space of empirical observations?
Wow! I had written my own piece in a very similar vein, look at this from a predictive processing perspective. It was sitting in draft form until I saw this and figured I should share, too. Some of our paragraphs are basically identical.
Yours: “In computer terms, sensory data comes in, and then some subsystem parses that sensory data and indicates where one’s “I” is located, passing this tag for other subsystems to use.”
Mine: ” It was as if every piece of sensory data that came into my awareness was being “tagged” with an additional piece of information: a distance, which was being computed. … The ‘this is me, this is not me’ sensation is then just another tag, one that’s computed heavily based upon the distance tags. ”
I came here with this exact question, and still don’t have a good answer. I feel confident that Eliezer is well aware that lucky guesses exist, and that Eliezer is attempting to communicate something in this chapter, but I remain baffled as to what.
Is the idea that, given our current knowledge that the theory was, in fact, correct, the most plausible explanation is that Einstein already had lots of evidence that this theory was true?
I understand that theory-space is massive, but I can locate all kinds of theories just by rolling dice or flipping coins to generate random bits. I can see how this ‘random thesis generation method’ still requires X number of bits to reach arbitrary theories, but the information required to reach a theory seems orthogonal to the truth. It feels like a stretch to call coin flips “evidence.” I’m guessing that’s what Robin_Hanson2 means by “lucky draw from the same process”; perhaps there were a few bits selected from observation, and a few others that came from lucky coin flips.
Perhaps a better question would be, given a large array of similar scenarios (someone traveling to look at evidence that will possibly refute a theory), how can I use the insight presented in this chapter to constrain anticipation and attempt to perform better than random in guessing which travelers are likely to see the theory violated, and which travelers are not? Or am i thinking of this the wrong way? I remain genuinely confused here, which i hope is a good sign as far as the search for truth :)