Daniel Kokotajlo comments on The Commitment Races problem

Daniel Kokotajlo 14 Jul 2023 21:30 UTC
LW: 2 AF: 2
0
AF
Yes. Humans are pretty bad at this stuff, yet still, society exists and mostly functions. The risk is unacceptably high, which is why I’m prioritizing it, but still, by far the most likely outcome of AGIs taking over the world—if they are as competent at this stuff as humans are—is that they talk it over, squabble a bit, maybe get into a fight here and there, create & enforce some norms, and eventually create a stable government/society. But yeah also I think that AGIs will be by default way better than humans at this sort of stuff. I am worried about the “out of distibution” problem though, I expect humans to perform worse in the future than they perform in the present for this reason.

Yes, some AGIs will be better than others at this, and presumably those that are worse will tend to lose out in various ways on average, similar to what happens in human society.

Consider that in current human society, a majority of humans would probably pay ransoms to free loved ones being kidnapped. Yet kidnapping is not a major issue; it’s not like 10% of the population is getting kidnapped and paying ransoms every year. Instead, the governments of the world squash this sort of thing (well, except for failed states etc.) and do their own much more benign version, where you go to jail if you don’t pay taxes & follow the laws. When you say “the top tier of rational superintelligences exploits everyone else” I say that is analogous to “the most rational/clever/capable humans form an elite class which rules over and exploits the masses.” So I’m like yeah, kinda sorta I expect that to happen, but it’s typically not that bad? Also it would be much less bad if the average level of rationality/capability/etc. was higher?

I’m not super confident in any of this to be clear.
- Wei Dai 14 Jul 2023 23:27 UTC
  LW: 2 AF: 2
  0
  AF Parent
  
  But yeah also I think that AGIs will be by default way better than humans at this sort of stuff.
  
  What’s your reasons for thinking this? (Sorry if you already explained this and I missed your point, but it doesn’t seem like you directly addressed my point that if AGIs learn from or defer to humans, they’ll be roughly human-level at this stuff?)
  
  When you say “the top tier of rational superintelligences exploits everyone else” I say that is analogous to “the most rational/clever/capable humans form an elite class which rules over and exploits the masses.” So I’m like yeah, kinda sorta I expect that to happen, but it’s typically not that bad?
  
  I think it could be much worse than current exploitation, because technological constraints prevent current exploiters from extracting full value from the exploited (have to keep them alive for labor, can’t make them too unhappy or they’ll rebel, monitoring for and repressing rebellions is costly). But with superintelligence and future/acausal threats, an exploiter can bypass all these problems by demanding that the exploited build an AGI aligned to itself and let it take over directly.
  - Daniel Kokotajlo 15 Jul 2023 13:41 UTC
    LW: 2 AF: 2
    0
    AF Parent
    I agree that if AGIs defer to humans they’ll be roughly human-level, depending on which humans they are deferring to. If I condition on really nasty conflict happening as a result of how AGI goes on earth, a good chunk of my probability mass (and possibly the majority of it?) is this scenario. (Another big chunk, possibly bigger, is the “humans knowingly or unknowingly build naive consequentialists and let rip” scenario, which is scarier because it could be even worse than the average human, as far as I know). Like I said, I’m worried.
    
    If AGIs learn from humans though, well, it depends on how they learn, but in principle they could be superhuman.
    
    Re: analogy to current exploitation: Yes there are a bunch of differences which I am keen to study, such as that one. I’m more excited about research agendas that involve thinking through analogies like this than I am about what people interested in this topic seem to do by default, which is think about game theory and Nash bargaining and stuff like that. Though I do agree that both are useful and complementary.