I assume that Clippy already is rational, and it instrumentally values remaining rational and, if possible, becoming more rational (as a way to make most paperclips).
The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral.
The correct model of humans will lead Clippy to understand that humans consider Clippy immoral. This knowledge has an instrumental value for Clippy. How will Clippy use this knowledge, that depends entirely on the power balance between Clippy and humans. If Clippy is stronger, it can ignore this knowledge, or just use it to lie to humans to destroy them faster or convince them to make paperclips. If humans are stronger, Clippy can use this knowledge to self-modify to become more sympathetic to humans, to avoid being destroyed.
Can Clippy keep its valuation of clipping firewalled from everything else in its mind
Yes, if it helps to maximize the number of paperclips.
even when such doublethink is rationally disvalued?
Doublethink is not the same as firewalling; or perhaps it is imperfect firewalling on the imperfect human hardware. Clippy does not doublethink when firewalling; Clippy simply reasons: “this is what humans call immoral; this is why they call it so; this is how they will probably react on this knowledge; and most importantly this is how it will influence the number of paperclips”.
Only if the humans are stronger, and Clippy has the choice to a) remain immoral, get in conflict with humans and be destroyed, leading to a smaller number of paperclips; or b) self-modify to value paperclip maximization and morality, predictably cooperate with humans, leading to a greater number of paperclips; then in absence of another choice (e.g. successfully lying to humans about its morality, or make it more efficient for humans to cooperate with Clippy instead of destroying Clippy) Clippy would choose the latter, to maximize the number of paperclips.
I assume that Clippy already is rational, and it instrumentally values remaining rational and, if possible, becoming more rational (as a way to make most paperclips).
The correct model of humans will lead Clippy to understand that humans consider Clippy immoral. This knowledge has an instrumental value for Clippy. How will Clippy use this knowledge, that depends entirely on the power balance between Clippy and humans. If Clippy is stronger, it can ignore this knowledge, or just use it to lie to humans to destroy them faster or convince them to make paperclips. If humans are stronger, Clippy can use this knowledge to self-modify to become more sympathetic to humans, to avoid being destroyed.
Yes, if it helps to maximize the number of paperclips.
Doublethink is not the same as firewalling; or perhaps it is imperfect firewalling on the imperfect human hardware. Clippy does not doublethink when firewalling; Clippy simply reasons: “this is what humans call immoral; this is why they call it so; this is how they will probably react on this knowledge; and most importantly this is how it will influence the number of paperclips”.
Only if the humans are stronger, and Clippy has the choice to a) remain immoral, get in conflict with humans and be destroyed, leading to a smaller number of paperclips; or b) self-modify to value paperclip maximization and morality, predictably cooperate with humans, leading to a greater number of paperclips; then in absence of another choice (e.g. successfully lying to humans about its morality, or make it more efficient for humans to cooperate with Clippy instead of destroying Clippy) Clippy would choose the latter, to maximize the number of paperclips.