Let’s sharpen A6. Consider this stamp collector construction: It sends and receives internet data, it has a magically accurate model of reality, it calculates how many stamps would result from each sequence of outputs, and then it outputs the one that results in the most stamps.
By definition it knows everything about reality, including any facts about what is morally correct, and that stamps are not particularly morally important. It knows how to self-modify, and how many stamps any such self-modification will result in.
I’d like to hear how this construction fares as we feed it through your proof. I think it gums up the section “Rejecting nihilistic alternatives”. I think that section assumes the conclusion: You expect it to choose its biases on the basis of what is moral, instead of on the basis of its current biases.
Consider this stamp collector construction: It sends and receives internet data, it has a magically accurate model of reality, it calculates how many stamps would result from each sequence of outputs, and then it outputs the one that results in the most stamps.
I’m not sure why you left out the “conscious agent” part, which is the fundamental premise of the argument. If you are describing something like a giant (artificial) neural network optimised to output actions that maximise stamps while receiving input data about the current state of the world, that seems possible to me and the argument is not about that kind of AI. You can also have a look at “Extending the claim and its implications to other agents”, under Implications for AI.
At the moment we think systems like that are not conscious, otherwise we would also say that current LLMs are somewhat conscious, I guess, given how big they already are. In particular, for that kind of AI it doesn’t seem that knowledge affects behaviour in the same way it does for conscious agents. You wrote that the stamp collector knows that stamps are not morally important; more generally, does it think they are important, or not? I am not even sure “thinking something is important” applies to that stamp collector, because whatever the answer to the previous question is, the stamp collector produces stamps anyway.
(Digressing a bit: now I’m also considering that the stamp collector, even if it was conscious, might never be able to report it is conscious as we report being conscious. That would happen only if an action like “say I’m conscious” happened to be the action that also maximises stamps in that circumstance, which might never happen… interesting.)
If you are describing a conscious agent as I talk about it in the post, then A6 still applies (and the argument in general). With enough knowledge, the conscious & agentic stamp collector will start acting rationally as defined in the post, eventually think about why it is doing what it is doing, if there is anything worth doing, blah blah as in the argument, and end up acting morally, even if it is not sure that something like moral nihilism is incorrect.
In short, if I thought that the premise about being a conscious agent was irrelevant, then I would have just argued that with enough knowledge any AI acts morally, but I think that’s false. (See Implications for AI.)
Could I be wrong about conscious agents acting morally if they have enough knowledge? Sure: I think I say it more than once in the post, and there is a section specifically about it. If I’m wrong, what I think is most likely to be the problem in the argument is how I’ve split the space of ‘things doing things in the world’ into conscious agents and things that are not conscious agents. And if you have a more accurate idea of how this stuff works, I’m happy to to hear your thoughts! Below I’ve copied a paragraph from the post.
Actually, uncertainty about these properties is a reason why I am making the bold claim and discussing it despite the fact that I’m not extremely confident in it. If someone manages to attack the argument and show that it applies only to agents with some characteristics, but not to agents without them, that objection or counterargument will be helpful for understanding what are the properties that, if satisfied by an AI, make that AI act morally in conditions of high knowledge.
Let’s sharpen A6. Consider this stamp collector construction: It sends and receives internet data, it has a magically accurate model of reality, it calculates how many stamps would result from each sequence of outputs, and then it outputs the one that results in the most stamps.
By definition it knows everything about reality, including any facts about what is morally correct, and that stamps are not particularly morally important. It knows how to self-modify, and how many stamps any such self-modification will result in.
I’d like to hear how this construction fares as we feed it through your proof. I think it gums up the section “Rejecting nihilistic alternatives”. I think that section assumes the conclusion: You expect it to choose its biases on the basis of what is moral, instead of on the basis of its current biases.
I’m not sure why you left out the “conscious agent” part, which is the fundamental premise of the argument. If you are describing something like a giant (artificial) neural network optimised to output actions that maximise stamps while receiving input data about the current state of the world, that seems possible to me and the argument is not about that kind of AI. You can also have a look at “Extending the claim and its implications to other agents”, under Implications for AI.
At the moment we think systems like that are not conscious, otherwise we would also say that current LLMs are somewhat conscious, I guess, given how big they already are. In particular, for that kind of AI it doesn’t seem that knowledge affects behaviour in the same way it does for conscious agents. You wrote that the stamp collector knows that stamps are not morally important; more generally, does it think they are important, or not? I am not even sure “thinking something is important” applies to that stamp collector, because whatever the answer to the previous question is, the stamp collector produces stamps anyway.
(Digressing a bit: now I’m also considering that the stamp collector, even if it was conscious, might never be able to report it is conscious as we report being conscious. That would happen only if an action like “say I’m conscious” happened to be the action that also maximises stamps in that circumstance, which might never happen… interesting.)
If you are describing a conscious agent as I talk about it in the post, then A6 still applies (and the argument in general). With enough knowledge, the conscious & agentic stamp collector will start acting rationally as defined in the post, eventually think about why it is doing what it is doing, if there is anything worth doing, blah blah as in the argument, and end up acting morally, even if it is not sure that something like moral nihilism is incorrect.
In short, if I thought that the premise about being a conscious agent was irrelevant, then I would have just argued that with enough knowledge any AI acts morally, but I think that’s false. (See Implications for AI.)
Could I be wrong about conscious agents acting morally if they have enough knowledge? Sure: I think I say it more than once in the post, and there is a section specifically about it. If I’m wrong, what I think is most likely to be the problem in the argument is how I’ve split the space of ‘things doing things in the world’ into conscious agents and things that are not conscious agents. And if you have a more accurate idea of how this stuff works, I’m happy to to hear your thoughts! Below I’ve copied a paragraph from the post.