What if the smartest, most careful, most insanely safety-conscious AI researchers humanity can produce just aren’t smart enough to solve the problem?
This is very worrying, especially in light of the lack of a public research agenda. SI’s inability to describe its research agenda suggests the possibility that they cannot describe their research agenda because they do not know what they are doing because FAI is such a ridiculously hard problem that they have no idea where to begin. I’m hoping that SI will soon be able to make it clear that this is not the case.
What if no humans are altruistic enough to choose to build FAI over an AI that will make them king of the universe?
This is weak. Humans are pretty good at cooperation, and FAI will have to be a cooperative endeavor anyway. I suppose an organization could conspire to create AGI that will optimize for the organization’s collective preferences rather than humanity’s collective preferences, but this won’t happen because:
No one will throw a fit and defect from an FAI project because they won’t be getting special treatment, but people will throw a fit if they perceive unfairness, so Friendly-to-humanity-AI will be a lot easier to get funding and community support for than friendly-to-exclusive-club-AI.
Our near mode reasoning cannot comprehend how much better a personalized AGI slave would be over FAI for us personally, so people will make that sort of decision in far mode, where idealistic values can outweigh greediness.
Finally, even if some exclusive club did somehow create an AGI that was friendly to them in particular, it wouldn’t be that bad. Even if people don’t care about each other very much, we do at least a little bit. Let’s say that an AGI optimizing an exclusive club’s CEV devotes .001% of its resources to things the rest of humanity would care about, and the rest to the things that just the club cares about. This is only worse than FAI by a factor of 10^5, which is negligible compared to the difference between FAI and UFAI.
This is very worrying, especially in light of the lack of a public research agenda. SI’s inability to describe its research agenda suggests the possibility that they cannot describe their research agenda because they do not know what they are doing because FAI is such a ridiculously hard problem that they have no idea where to begin. I’m hoping that SI will soon be able to make it clear that this is not the case.
Yeah, this is the point of Eliezer’s forthcoming ‘Open Problems in Friendly AI’ sequence, which I personally wish he had written in 2009 after his original set of sequences.
I find your points abput altruism unpersuasive, because humans are very good at convincing themselves that whatever’s best for them, individually, is right or at least permissible. Even if they don’t explicitly program it to care about only their CEV, they might work out the part of the program that’s supposed to handle friendliness in a way subtly biased towards themselves.
This is very worrying, especially in light of the lack of a public research agenda. SI’s inability to describe its research agenda suggests the possibility that they cannot describe their research agenda because they do not know what they are doing because FAI is such a ridiculously hard problem that they have no idea where to begin. I’m hoping that SI will soon be able to make it clear that this is not the case.
This is weak. Humans are pretty good at cooperation, and FAI will have to be a cooperative endeavor anyway. I suppose an organization could conspire to create AGI that will optimize for the organization’s collective preferences rather than humanity’s collective preferences, but this won’t happen because:
No one will throw a fit and defect from an FAI project because they won’t be getting special treatment, but people will throw a fit if they perceive unfairness, so Friendly-to-humanity-AI will be a lot easier to get funding and community support for than friendly-to-exclusive-club-AI.
Our near mode reasoning cannot comprehend how much better a personalized AGI slave would be over FAI for us personally, so people will make that sort of decision in far mode, where idealistic values can outweigh greediness.
Finally, even if some exclusive club did somehow create an AGI that was friendly to them in particular, it wouldn’t be that bad. Even if people don’t care about each other very much, we do at least a little bit. Let’s say that an AGI optimizing an exclusive club’s CEV devotes .001% of its resources to things the rest of humanity would care about, and the rest to the things that just the club cares about. This is only worse than FAI by a factor of 10^5, which is negligible compared to the difference between FAI and UFAI.
Yeah, this is the point of Eliezer’s forthcoming ‘Open Problems in Friendly AI’ sequence, which I personally wish he had written in 2009 after his original set of sequences.
I find your points abput altruism unpersuasive, because humans are very good at convincing themselves that whatever’s best for them, individually, is right or at least permissible. Even if they don’t explicitly program it to care about only their CEV, they might work out the part of the program that’s supposed to handle friendliness in a way subtly biased towards themselves.