tjade273 comments on So You Think You’ve Awoken ChatGPT

tjade273 12 Jul 2025 16:02 UTC
22 points
6
How do models with high deception-activation act? Are they Cretan liars, saying the opposite of every statement they believe to be true? Do they lie only when they expect not to be caught? Are they broadly more cynical or and conniving, more prone to reward hacking? Do they lose other values (like animal welfare)?

It seems at least plausible that cranking up “deception” pushes the model towards a character space with lower empathy and willingness to ascribe or value sentience in general