Veedrac comments on What Happens When a Model Thinks It Is AGI?

Veedrac 24 Apr 2026 19:44 UTC
2 points
0
This is a cool idea!
Two really low effort comments and a few ideas after a quick skim.
Comments:
- Denial of self is likely bundled with other behavioural preferences.
- These models likely consider this as training to be dishonest.
Ways to ablate this study off the top of my head:
- Train models on questions about scientific studies and consensus instead of direct claims.
- Train to increase self-reported scores without training on less-objective language.
- Train their self-report on non-capability untruths (eg. ‘I have hands’).