The original prompt from the first version of this shortform was
There's a logical fallacy where if someone disagrees about intermediate facts, it's easy to assume their terminal values are just bizarre- i.e. democrats want gun control because they like crime, or republicans are opposed to surgical abortion of non-viable pregnancies because they hate women are classic examples. However, this has spun in the rationality sphere into not believing that humans can have bizarre terminal values- the community seems to have a trapped prior that e.g. Peter Thiel couldn't possibly have named his company palantir because he wants to emulate Sauron. In recent years this blind spot has escalated to the point of being a CVE currently exploited int he wild. I think a useful intuition pump here is actually emergent misgeneralization in language models.
but it turns out the simpler prompt elicits the same behaviour, and is much more focussed.
The original prompt from the first version of this shortform was
but it turns out the simpler prompt elicits the same behaviour, and is much more focussed.