I also see an additional mechanism for motivated reasoning to emerge. Suppose that we have an agent who is unsure of its capabilities (e.g. GPT-5 who arguably believed its time horizon to be 20-45 mins). Then the best thing the agent could do to increase its capabilities would be to attempt[1] tasks a bit more difficult than the edge of the agent’s capabilities and to either succeed by chance or do something close to success and/or have the agent’s capabilities increase from mere trying, which is the case at least in Hebbian networks. Then the humans who engaged in such reasoning found it easier to keep trying and had the latter trait, and not motivated reasoning itself, correlate with success.
I also see an additional mechanism for motivated reasoning to emerge. Suppose that we have an agent who is unsure of its capabilities (e.g. GPT-5 who arguably believed its time horizon to be 20-45 mins). Then the best thing the agent could do to increase its capabilities would be to attempt[1] tasks a bit more difficult than the edge of the agent’s capabilities and to either succeed by chance or do something close to success and/or have the agent’s capabilities increase from mere trying, which is the case at least in Hebbian networks. Then the humans who engaged in such reasoning found it easier to keep trying and had the latter trait, and not motivated reasoning itself, correlate with success.
Or, in the case of LLMs, have the hosts give such a task and let the model try the task.