The section “Bayesian Orthogonality thesis” doesn’t seem right, since a Bayesian would think in terms of probabilities rather than possibilities (“could construct superintelligent AIs with more or less any goals”). If you’re saying that we should assign a uniform distribution for what AI goals will be realized in the future, that’s clearly wrong.
I think the typical AI researcher, after reading this paper, will think “sure, it might be possible to build agents with arbitrary goals if one tried, but my approach will probably lead to a benevolent AI”. (See here for an example of this.) So I’m not sure why you’re putting so much effort into this particular line of argument.
This is the first step (pointed more towards philosophers). Formalise the “we could construct an AI with arbitrary goals”, and with that in the background, zoom in on the practical arguments with the AI researchers.
Will restructure the Bayesian section. Some philosophers argue things like “we don’t know what moral theories are true, but a rational being would certainly find them”; I want to argue that this is equivalent, from our perspective, with the AI’s goals ending up anywhere. What I meant to say is that ignorance of this type is like any other type of ignorance, hence the “Bayesian” terminology.
This is the first step (pointed more towards philosophers). Formalise the “we could construct an AI with arbitrary goals”, and with that in the background, zoom in on the practical arguments with the AI researchers.
Ok, in that case I would just be wary about people being tempted to cite the paper to AI researchers without having the followup arguments in place, who would then think that their debating/discussion partners are attacking a strawman.
Thanks. To go back to my original point a bit, how useful is it to debate philosophers about this? (When debating AI researchers, given that they probably have a limited appetite for reading papers arguing that what they’re doing is dangerous, it seems like it would be better to skip this paper and give the practical arguments directly.)
Maybe I’ve spent too much time around philosophers—but there are some AI designers who seem to spout weak arguments like that, and this paper can’t hurt. When we get a round to writing a proper justification for AI researchers, having this paper to refer back to avoids going over the same points again.
Plus, it’s a lot easier to write this paper first, and was good practice.
Without getting in to the likelihood of a ‘typical AI researcher’ successfully creating a benevolent AI, do you doubt Goertzel’s “Interdependency Thesis”? I find both to be rather obviously true. Yes its possible in principle for almost any goal system to be combined with almost any type or degree of intelligence, but that’s irrelevant because in practice we can expect the distributions over both to be highly correlated in some complex fashion.
I really don’t understand why this Orthogonality idea is still brought up so much on LW. It may be true, but it doesn’t lead to much.
The space of all possible minds or goal systems is about as relevant to the space of actual practical AIs as the space of all configuration of a human’s molecules is to the space of a particular human’s set of potential children.
Couple of comments:
The section “Bayesian Orthogonality thesis” doesn’t seem right, since a Bayesian would think in terms of probabilities rather than possibilities (“could construct superintelligent AIs with more or less any goals”). If you’re saying that we should assign a uniform distribution for what AI goals will be realized in the future, that’s clearly wrong.
I think the typical AI researcher, after reading this paper, will think “sure, it might be possible to build agents with arbitrary goals if one tried, but my approach will probably lead to a benevolent AI”. (See here for an example of this.) So I’m not sure why you’re putting so much effort into this particular line of argument.
This is the first step (pointed more towards philosophers). Formalise the “we could construct an AI with arbitrary goals”, and with that in the background, zoom in on the practical arguments with the AI researchers.
Will restructure the Bayesian section. Some philosophers argue things like “we don’t know what moral theories are true, but a rational being would certainly find them”; I want to argue that this is equivalent, from our perspective, with the AI’s goals ending up anywhere. What I meant to say is that ignorance of this type is like any other type of ignorance, hence the “Bayesian” terminology.
Ok, in that case I would just be wary about people being tempted to cite the paper to AI researchers without having the followup arguments in place, who would then think that their debating/discussion partners are attacking a strawman.
Hum, good point; I’ll try and put in some disclaimer, emphasising that this is a partial result...
Thanks. To go back to my original point a bit, how useful is it to debate philosophers about this? (When debating AI researchers, given that they probably have a limited appetite for reading papers arguing that what they’re doing is dangerous, it seems like it would be better to skip this paper and give the practical arguments directly.)
Maybe I’ve spent too much time around philosophers—but there are some AI designers who seem to spout weak arguments like that, and this paper can’t hurt. When we get a round to writing a proper justification for AI researchers, having this paper to refer back to avoids going over the same points again.
Plus, it’s a lot easier to write this paper first, and was good practice.
Without getting in to the likelihood of a ‘typical AI researcher’ successfully creating a benevolent AI, do you doubt Goertzel’s “Interdependency Thesis”? I find both to be rather obviously true. Yes its possible in principle for almost any goal system to be combined with almost any type or degree of intelligence, but that’s irrelevant because in practice we can expect the distributions over both to be highly correlated in some complex fashion.
I really don’t understand why this Orthogonality idea is still brought up so much on LW. It may be true, but it doesn’t lead to much.
The space of all possible minds or goal systems is about as relevant to the space of actual practical AIs as the space of all configuration of a human’s molecules is to the space of a particular human’s set of potential children.