Agreed. If you want to talk more about these ideas sometime, I’d be happy to video chat!
re: inverse proportinality: Good point, I’ll have to think about that more. Maybe it does neatly cancel out, or even worse, since my utility function isn’t linear in happy lives lived, maybe it more than cancels out.
I for one have seriously investigated all those weird philosophical ideas you mentioned. ;) And I think our community has been pretty good about taking these ideas seriously, especially compared to, well, literally every other community, including academic philosophy. Our overton window definitely includes all these ideas, I’d say.
But I agree with your general point that there is a tension we should explore. Even if we are OK seriously discussing these ideas, we often don’t actually live by them. Our overton window includes them, but our median opinion doesn’t. Why not?
I think there is a good answer, and it has to do with humility/caution. Philosophy is weird. If you follow every argument where it leads you, you very quickly find that your beliefs don’t add up to normality, or anything close. Faith that beliefs will (approximately) add up to normality seems to be important for staying sane and productive, and moreover, seems to have been vindicated often in the past: crazy-sounding arguments turn out to have flaws in them, or maybe they work but there is an additional argument we hadn’t considered that combines with it to add up to normality.
Yeah. It depends on how you define extinction. I agree that most simulations don’t last very long. (You don’t even need the doomsday argument to get that conclusion, I think)
I’d like to see someone explore the apparent contradiction in more detail. Even if I were convinced that we will almost certainly fail, I might still prioritize x-risk reduction, since the stakes are so high.
Anyhow, my guess is that most people think the doomsday argument probably doesn’t work. I am not sure myself. If it does work though, its conclusion is not that we will all go extinct soon, but rather that ancestor simulations are one of the main uses of cosmic resources.
AI Impacts has a list of reasons people give for why current methods won’t lead to human-level AI. With sources. It’s not exactly what you are looking for, but it’s close, because most of these could be inverted and used as warning signs for AGI, e.g. “Current methods can’t build good, explanatory causal models” becomes “When we have AI which can build good, explanatory causal models, that’s a warning sign.”
I’d be happy to volunteer a bit. I don’t have much time, but this sounds fun, so maybe I could do a few.
OK, so you + Gyrodiot are making me think maybe I should do another one soon. But to be honest I need to focus less on blogging and more on working for a bit, so I personally won’t be ready for at least a few weeks I think.
Whenever it happens, I should schedule it far in advance I think. That way people have more of a chance to find out about it.
Oh right, how could I forget! This makes me very happy. :D
Good point about inner alignment problems being a blocker to date-competitiveness for IDA… but aren’t they also a blocker to date-competitiveness for every other alignment scheme too pretty much? What alignment schemes don’t suffer from this problem?
I’m thinking “Do anything useful that a human with a lot of time can do” is going to be substantially less capable than full-blown superintelligent AGI. However, that’s OK, because we can use IDA as a stepping-stone to that. IDA gets us an aligned system substantially more capable than a human, and we use that system to solve the alignment problem and build something even better.
It’s interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness. I think it’s fine to just talk about “competitiveness” full stop, and only bother to specify what we mean more precisely when needed. Sometimes we’ll mean one of the three, sometimes two of the three, sometimes all three.
Yes. An upgrade to an AI that makes it run faster, with no side-effects, would be an improvement to both performance and cost-competitiveness.
I knew that the goal was to get IDA to be cost-competitive, but I thought current versions of it weren’t. But that was just my rough impression; glad to be wrong, since it makes IDA seem even more promising. :) Of all the proposals I’ve heard of, IDA seems to have the best combination of cost, date, and performance-competitiveness.
I agree this may be true in most cases, but the chance of it not being true for AI is large enough to motivate the distinction. Besides, not all cases in which performance and cost can be traded off are the same; in some scenarios the “price” of performance is very high whereas in other scenarios it is low. (e.g. in Gradual Economic Takeover, let’s say, a system being twice as qualitatively intelligent could be equivalent to being a quarter the price. Whereas in Final Conflict, a system twice as qualitatively intelligent would be equivalent to being one percent the price.) So if we are thinking of a system as “competitive with X% overhead,” well, X% is going to vary tremendously depending on which scenario is realized. Seems worth saying e.g. “costs Y% more compute, but is Z% more capable.”
Mmm, nice. Thanks! I like your distinction also. I think yours is sufficiently different that we shouldn’t see the two sets of distinctions as competing.* A system which has an objective which would be capable on paper but isn’t capable in practice due to inner-alignment failures would be performance-uncompetitive but objective-competitive. For this reason I think we shouldn’t equate objective and performance competitiveness.
If operating an AI system turns out to be an important part of the cost, then cost+date competitiveness would turn out to be different from training competitiveness, because cost competitiveness includes whatever the relevant costs are. However I expect operating costs will be much less relevant to controlling the future than costs incurred during the creation of the system (all that training, data-gathering, infrastructure building, etc.) so I think the mapping between cost+date competitiveness and training competitiveness basically works.
*Insofar as they are competing, I still prefer mine; as you say, it applies to more than just prosaic AI alignment proposals. Moreover, it makes it easier for us to talk about competitions as well, e.g. “In the FOOM scenario we need to win a date competition; cost-competitiveness still matters but not as much.” Moreover cost, performance, and date are fairly self-explanatory terms, whereas as you point out “objective” is more opaque. Moreover I think it’s worth distinguishing between cost and date competitiveness; in some scenarios one will be much more important than the other, and of course the two kinds of competitiveness vary independently in AI safety schemes (indeed maybe they are mildly anti-correlated? Some schemes are fairly well-defined and codified already, but would require tons of compute, whereas other schemes are more vague and thus would require tons of tweaking and cautious testing to get right, but don’t take that much compute. I do like how your version maps more onto the inner vs. outer alignment distinction.
Some thoughts that came to me after I wrote this post:
--I’m not sure I should define date-competitive the way I do. Maybe instead of “can be built” it should be “is built.” If we go the latter route, the FOOM scenario is an extremely intense date competition. If we go the former route, the FOOM scenario is not necessarily an intense date competition; it depends on what other factors are at play. For example, maybe there are only a few major AI projects and all of them are pretty socially responsible, so a design is more likely to win if it can be built sooner, but it won’t necessarily win; maybe cooler heads will prevail and build a safer design instead.
--Why is date-competitiveness worth calling a kind of competitiveness at all? Why not just say: “We want our AI safety scheme/design to be cost- and performance-competitive, and also we need to be able to build it fairly quickly compared to the other stuff that gets built.” Well, 1. Even that is clunky and awkward compared to the elegant ”...and also date-competitive.” 2. It really does have the comparative flavor of competition to it; what matters is not how long it takes us to complete our safety scheme, but how long it takes relative to unaligned schemes, and it’s not as simple as just “we need to be first,” rather it’s that sooner is better but doing it later isn’t necessarily game over… 3. It seems to be useful for describing date competitions, which are important to distinguish from situations which are not date competitions or less so. (Aside: A classic criticism of the “Let’s build uploads first, and upload people we trust” strategy is that neuromorphic AI will probably come before uploads. In other words, this strategy is not date-competitive.)
--I’m toying with the idea of adding “alignment-competitiveness” (meaning, as aligned or more aligned than competing systems) and “alignment competition” to the set of definitions. This sounds silly, but it would be conceptually neat, because then we can say: We hope for scenarios in which control of the future is a very intense alignment competition, and we are working hard to make it that way. ”
Just wanna say, I intend to get around to writing rebuttals someday. I definitely have several counterarguments in mind; the forceful takedowns you mention weren’t very convincing to me, though they did make me update away from fast takeoff.
Well, that wasn’t the scenario I had in mind. The scenario I had in mind was: People in the year 2030 pass a law requiring future governments to make ancestor simulations with happy afterlives, because that way it’s probable that they themselves will be in such a simulation. (It’s like cryonics, but cheaper!) Then, hundreds or billions of years later, the future government carries out the plan, as required by law.
Not saying this is what we should do, just saying it’s a decision I could sympathize with, and I imagine it’s a decision some fraction of people would make, if they thought it was an option.
Interesting. Well, I imagine you don’t have the time right now, but I just want to register that I’d love to hear more about this. What questionable assumptions does Superintelligence make, that aren’t made by Human Compatible? (This request for info goes out to everyone, not just Rohin)