Interesting, and useful summary of the disagreement. Note that steps 2 and 3 need not be sequential—they can happen simultaneously or in reverse order. And step 2 may not involve action, if the supervisor is imperfect; it may be simply “predict actions or situations that the supervisor can’t evaluate well”.
During this gap, parents can correct the kid’s moral values through education.
This seems like a huge and weird set of assumptions. Deception isn’t about morals, it’s about alignment. An entity lies to other entities only when they are unaligned in goals or beliefs, and don’t expect to get aligned behaviors by truth-telling. The correction via education is not to fix the morals, but to improve the tactics—cooperative behavior based on lies is less durable than that based on truth (or alignment, but that’s out of scope for this discussion).
Unfortunately, in the case of children, seed AIs, and other non-powerful entities, there may be no path to cooperation based on truth, and lies are in fact the best way to pursue one’s goals. Which brings us to the question of what to do with a seed AI that lies, but not so well as to be unnoticeable.
If the supervisor isn’t itself perfectly consistent and aligned, some amount of self-deception is present. Any competent seed AI (or child) is going to have to learn deception
I put step 2. before step 3. because I thought something like “first you learn that there is some supervisor watching, and then you realize that you would prefer him not to watch”. Agreed that step 2. could happen only by thinking.
Yep, deception is about alignment, and I think that most parents would be more concerned about alignment, not improving the tactics. However, I agree that if we take “education” in a broad sense (including high school, college, etc.), it’s unofficially about tactics.
It’s interesting to think of it in terms of cooperation—entities less powerful than their supervisors are (instrumentally) incentivized to cooperate.
what to do with a seed AI that lies, but not so well as to be unnoticeable
Well, destroy it, right? If it’s deliberately doing a. or b. (from “Seed AI”) then step 4. has started. The other cases where it could be “lying” from saying wrong things would be if its model is consistently wrong (e.g. stuck in a local minima), so you better start again from scratch.
If the supervisor isn’t itself perfectly consistent and aligned, some amount of self-deception is present. Any competent seed AI (or child) is going to have to learn deception
That’s insightful. Biased humans will keep saying that they want X when they want Y instead, so deceiving humans by pretending to be working on X while doing Y seems indeed natural (assuming you have “maximize what humans really want” in your code).
Interesting, and useful summary of the disagreement. Note that steps 2 and 3 need not be sequential—they can happen simultaneously or in reverse order. And step 2 may not involve action, if the supervisor is imperfect; it may be simply “predict actions or situations that the supervisor can’t evaluate well”.
This seems like a huge and weird set of assumptions. Deception isn’t about morals, it’s about alignment. An entity lies to other entities only when they are unaligned in goals or beliefs, and don’t expect to get aligned behaviors by truth-telling. The correction via education is not to fix the morals, but to improve the tactics—cooperative behavior based on lies is less durable than that based on truth (or alignment, but that’s out of scope for this discussion).
Unfortunately, in the case of children, seed AIs, and other non-powerful entities, there may be no path to cooperation based on truth, and lies are in fact the best way to pursue one’s goals. Which brings us to the question of what to do with a seed AI that lies, but not so well as to be unnoticeable.
If the supervisor isn’t itself perfectly consistent and aligned, some amount of self-deception is present. Any competent seed AI (or child) is going to have to learn deception
Your comment makes a lot os sense, thanks.
I put step 2. before step 3. because I thought something like “first you learn that there is some supervisor watching, and then you realize that you would prefer him not to watch”. Agreed that step 2. could happen only by thinking.
Yep, deception is about alignment, and I think that most parents would be more concerned about alignment, not improving the tactics. However, I agree that if we take “education” in a broad sense (including high school, college, etc.), it’s unofficially about tactics.
It’s interesting to think of it in terms of cooperation—entities less powerful than their supervisors are (instrumentally) incentivized to cooperate.
Well, destroy it, right? If it’s deliberately doing a. or b. (from “Seed AI”) then step 4. has started. The other cases where it could be “lying” from saying wrong things would be if its model is consistently wrong (e.g. stuck in a local minima), so you better start again from scratch.
That’s insightful. Biased humans will keep saying that they want X when they want Y instead, so deceiving humans by pretending to be working on X while doing Y seems indeed natural (assuming you have “maximize what humans really want” in your code).