Well the number comes from the idea of one-to-one monitoring. Obviously, there’s other stuff to do to establish a stable unipolar world order, but monitoring seems like the most resource intensive part, so it’s an order of magnitude estimate. Also, realistically, one person could monitor ten people, so that was an order of magnitude estimate with some leeway.
But if there are 7 billion HSIFAUH which are collectively capable of taking over the world, how is not a potential existential catastrophe if they have inhuman values?
I think they can be controlled. Whoever is providing the observations to any instance of HSIFAUH has an arsenal of carrots and sticks (just by having certain observations correlate with actual physical events that occur in the household(s) of humans that generate the data), and I think merely human-level intelligence can kept in check by someone in a position of power over them. So I think real humans could stay at the wheel over 7 billion instances of HSIFAUH. (I mean, this is teetering at the edge of existential catastrophe already given the existence of simulations of people who might have the experience of being imprisoned, but I think with careful design of the training data, this could be avoided). But in terms of extinction threat to real-world humans, this starts to look more like the problem maintaining a power structure over a vast number of humans and less like typical AI alignment difficulties; historically, the former seems to be a solvable problem.
>Or maybe the right way to look at it is whether N = 10 could finance a rapidly exponentially growing N.
How? And why would it grow fast enough to get to a large enough N before someone deploys ~AIXI?
Right, this analysis gets complicated because you have to analyze the growth rate of N. Given your lead time from having more computing power than the reckless team, one has to analyze how many doubling periods you have time for. I hear Robin Hanson is the person to read regarding questions like this. I don’t have any opinions here. But the basic structure regarding “How?” is spend some fraction of computing resources making money, then buy more computing resources with that money.
>It should be possible to weaken the online version and get some of this speedup.
What do you have in mind here?
Well, nothing in particular when I wrote that, but thank you for pushing me. Maybe only update the posterior at some timesteps (and do it infinitely many times but with diminishing frequency). Or more generally, you divide resources between searching for programs that retrodict observed behavior and running copies of the best one so far, and you just shift resource allocation toward the latter over time.
You do have to solve some safety problems that the reckless team doesn’t though, don’t you? What do you think the main safety problems are?
If it turns out you have to do special things to avoid mesa-optimizers, then yes. Otherwise, I don’t think you have to deal with other safety problems if you’re just aiming to imitate human behavior.
Obviously, there’s other stuff to do to establish a stable unipolar world order
I was asking about this part. I’m not convinced HSIFAUH allows you to do this in a safe way (e.g., without triggering a war that you can’t necessarily win).
Given your lead time from having more computing power than the reckless team, one has to analyze how many doubling periods you have time for.
Another complication here is that the people trying to build ~AIXI can probably build an economically useful ~AIXI using less compute than you need for ~HSIFAUH (for jobs that don’t need to model humans), and start doing their own doublings.
But in terms of extinction threat to real-world humans, this starts to look more like the problem maintaining a power structure over a vast number of humans and less like typical AI alignment difficulties; historically, the former seems to be a solvable problem.
I don’t think we’ve seen a solution that’s very robust though. Plus, having to maintain such a power structure starts to become a human safety problem for the real humans (i.e., potentially causes their values to become corrupted).
Another complication here is that the people trying to build ~AIXI can probably build an economically useful ~AIXI using less compute than you need for ~HSIFAUH (for jobs that don’t need to model humans), and start doing their own doublings.
Good point.
Regarding the other two points, my intuition was that a few dozen people could work out the details satisfactorily in a year. If you don’t share this intuition, I’ll adjust downward on that. But I don’t feel up to putting in those man-hours myself. It seems like there are lots of people without a technical background who are interested in helping avoid AI-based X-risk. Do you think this is a promising enough line of reasoning to be worth some people’s time?
Regarding the other two points, my intuition was that a few dozen people could work out the details satisfactorily in a year. If you don’t share this intuition, I’ll adjust downward on that.
I’m pretty skeptical of this, but then I’m pretty skeptical of all current safety/alignment approaches and this doesn’t seem especially bad by comparison, so I think it might be worth including in a portfolio approach. But I’d like to better understand why you think it’s promising. Do you have more specific ideas of how ~HSIFAUH can be used to achieve a Singleton and to keep it safe, or just a general feeling that it should be possible?
My intuitions are mostly that if you can provide significant rewards and punishments basically for free in imitated humans (or more to the point, memories thereof), and if you can control the flow of information throughout the whole apparatus, and you have total surveillance automatically, this sort of thing is a dictator’s dream. Especially because it usually costs money to make people happy, and in this case, it hardly does—just a bit of computation time. In a world with all the technology in place that a dictator could want, but also it’s pretty cheap to make everyone happy, it strikes me as promising that the system itself could be kept under control.
Well the number comes from the idea of one-to-one monitoring. Obviously, there’s other stuff to do to establish a stable unipolar world order, but monitoring seems like the most resource intensive part, so it’s an order of magnitude estimate. Also, realistically, one person could monitor ten people, so that was an order of magnitude estimate with some leeway.
I think they can be controlled. Whoever is providing the observations to any instance of HSIFAUH has an arsenal of carrots and sticks (just by having certain observations correlate with actual physical events that occur in the household(s) of humans that generate the data), and I think merely human-level intelligence can kept in check by someone in a position of power over them. So I think real humans could stay at the wheel over 7 billion instances of HSIFAUH. (I mean, this is teetering at the edge of existential catastrophe already given the existence of simulations of people who might have the experience of being imprisoned, but I think with careful design of the training data, this could be avoided). But in terms of extinction threat to real-world humans, this starts to look more like the problem maintaining a power structure over a vast number of humans and less like typical AI alignment difficulties; historically, the former seems to be a solvable problem.
Right, this analysis gets complicated because you have to analyze the growth rate of N. Given your lead time from having more computing power than the reckless team, one has to analyze how many doubling periods you have time for. I hear Robin Hanson is the person to read regarding questions like this. I don’t have any opinions here. But the basic structure regarding “How?” is spend some fraction of computing resources making money, then buy more computing resources with that money.
Well, nothing in particular when I wrote that, but thank you for pushing me. Maybe only update the posterior at some timesteps (and do it infinitely many times but with diminishing frequency). Or more generally, you divide resources between searching for programs that retrodict observed behavior and running copies of the best one so far, and you just shift resource allocation toward the latter over time.
If it turns out you have to do special things to avoid mesa-optimizers, then yes. Otherwise, I don’t think you have to deal with other safety problems if you’re just aiming to imitate human behavior.
I was asking about this part. I’m not convinced HSIFAUH allows you to do this in a safe way (e.g., without triggering a war that you can’t necessarily win).
Another complication here is that the people trying to build ~AIXI can probably build an economically useful ~AIXI using less compute than you need for ~HSIFAUH (for jobs that don’t need to model humans), and start doing their own doublings.
I don’t think we’ve seen a solution that’s very robust though. Plus, having to maintain such a power structure starts to become a human safety problem for the real humans (i.e., potentially causes their values to become corrupted).
Good point.
Regarding the other two points, my intuition was that a few dozen people could work out the details satisfactorily in a year. If you don’t share this intuition, I’ll adjust downward on that. But I don’t feel up to putting in those man-hours myself. It seems like there are lots of people without a technical background who are interested in helping avoid AI-based X-risk. Do you think this is a promising enough line of reasoning to be worth some people’s time?
I’m pretty skeptical of this, but then I’m pretty skeptical of all current safety/alignment approaches and this doesn’t seem especially bad by comparison, so I think it might be worth including in a portfolio approach. But I’d like to better understand why you think it’s promising. Do you have more specific ideas of how ~HSIFAUH can be used to achieve a Singleton and to keep it safe, or just a general feeling that it should be possible?
My intuitions are mostly that if you can provide significant rewards and punishments basically for free in imitated humans (or more to the point, memories thereof), and if you can control the flow of information throughout the whole apparatus, and you have total surveillance automatically, this sort of thing is a dictator’s dream. Especially because it usually costs money to make people happy, and in this case, it hardly does—just a bit of computation time. In a world with all the technology in place that a dictator could want, but also it’s pretty cheap to make everyone happy, it strikes me as promising that the system itself could be kept under control.