My guess is a roughly equally “central” problem is the incentive landscape around the OpenPhil/Anthropic school of thought
where you see Sam, I suspect something like “the lab memeplexes”. Lab superagents have instrumental convergent goals, and the instrumental convergent goals lead to instrumental, convergent beliefs, and also to instrumental blindspots
there are strong incentives for individual people to adjust their beliefs: money, social status, sense of importance via being close to the Ring
there are also incentives for people setting some of the incentives: funding something making progress on something seems more successful and easier than funding the dreaded theory
My guess is a roughly equally “central” problem is the incentive landscape around the OpenPhil/Anthropic school of thought
where you see Sam, I suspect something like “the lab memeplexes”. Lab superagents have instrumental convergent goals, and the instrumental convergent goals lead to instrumental, convergent beliefs, and also to instrumental blindspots
there are strong incentives for individual people to adjust their beliefs: money, social status, sense of importance via being close to the Ring
there are also incentives for people setting some of the incentives: funding something making progress on something seems more successful and easier than funding the dreaded theory