cubefox comments on Selfmaker662′s Shortform

cubefox 25 Jan 2026 21:03 UTC
8 points
2
ChatGPT is generally pretty weird. If you ask it, the non-reasoning model still insists that calling someone the n-word is worse than letting millions of people die. Which is insane. It supports EY’s claim that RLHF creates something that superficially looks aligned but turns out to be alien when tested in an OOD context.