the gears to ascension comments on AI alignment with humans… but with which humans?

the gears to ascension 9 Sep 2022 20:42 UTC
3 points
0
[comment quality: informed speculation dictated to phone in excitement; lit search needed]

this is a great point and I’m glad you bring it up. I would argue that the core of strongly superintelligent ai safety requires finding the set of actions that constraint-satisfy the [preferences/values/needs/etc] of as many [humans/mammals/life forms/agents/etc] as possible; and that the question is how to ensure that we’re aiming towards that with minimal regret. in other words, I would argue that fully solving ai safety cannot reduce to anything less than fully and completely solving conflict between all beings—effectively-perfect defense analysis and game theory. a key step on that path seems to me to be drastically increasing usable-resource availability per agent and as such I wouldn’t be surprised to find out that the bulk of the action-space solution to AI safety ends up being building a really big power plant or something surprisingly simple like that. I expect that a perfect solution to the hard version of ai safety would look like a series of game theory style proofs that show every military in the world unilaterally stronger paths towards least conflict that the humans in charge of them can actually understand.

on the lead up, though, projects focusing on collective intelligence and cooperative learning are promising, imo. the ipam collective intelligence workshop had some great talks about problem solving with groups of computational nodes, that’s on YouTube. the Simons institute has had many talks this year on cooperative game theory and social networks. I’ve got a bunch of links on my shortform of academic talk channels on YouTube that I feel weigh in on this sort of stuff besides those two.

I suspect a significant part of the project of cooperative ai will be to encourage ai to become good at mapping and communicating the trade-off landscape and mediating discussions between people with conflicting preferences.
- geoffreymiller 9 Sep 2022 21:01 UTC
  2 points
  0
  Parent
  gears of ascension—thanks for this comment, and for the IPAM video and Simons Institute suggestion.
  You noted ‘fully solving AI safety cannot reduce to anything less than fully and completely solving conflict between all beings’. That’s exactly my worry.
  As long as living beings are free to reproduce and compete for finite resources, evolution will churn along, in such a way that beings maintain various kinds of self-interest that inevitably lead to some degree of conflict. It seems impossible for ongoing evolution to result in a world where all beings have interests that are perfectly aligned with each other. You can’t get from natural selection to a single happy collective global super-organism (‘Gaia’, or whatever). And you can’t have full AI alignment with ‘humanity’ unless humanity becomes such a global super-organism with no internal conflicts.
  - the gears to ascension 9 Sep 2022 21:36 UTC
    1 point
    0
    Parent
    I don’t think we have to completely eliminate evolution, we need only eliminate a large subset of evolutionary trajectories away from high-fitness manifolds in evo game theory space. evolution’s only “desire” that can be described globally (afaik?) is to find species of self-replicating pattern that endure; morality is a pattern in which self-replicating patterns are durable under what conditions, and much of the difficulty of fixing it arises from not having enough intervention speed to build safeguards into everything against destructive competition. eventually we do need some subpaths in evolution to be completely eliminated, but we can do so constructively, for the most part—if we can build a trustable map of which strategies are permanently unacceptable that only forbids the smallest possible set of behaviors. I suspect the continuous generalization of generous-tit-for-tat-with-forgiveness will be highly relevant to this, as will figuring out how to ensure all life respects all other life’s agency.
    
    of course, this does rely on our ability to improve on the existing natural pattern that in order for a closed evolutionary system to remain in a stable state for a long time, growth rate must slow (cite the entire field of ecological growth patterns or whatever it’s called). we’d need to be able to give every gene a map that describes the implications of needing to preserve trajectory, rather than compete destructively.
    
    but overall I think that eventually evolution is effectively guaranteed to converge on producing agents who have strong enough game theory to never again have a war or catastrophic miscommunication about competitive violence, and thus for some purposes indeed act as a single agent. the question is whether there will be anything significant left of today’s kingdom of life, genetic and memetic, by the time that limit is reached. it seems to me that it depends on figuring out how to ensure that mutual aid becomes the only factor of evolution. I think we can pull it off constructively.