AI safety & alignment researcher
In Rob Bensinger’s typology: AGI-wary/alarmed, welfarist, and eventualist.
Public stance: AI companies are doing their best to build ASI (AI much smarter than humans), and have a chance of succeeding. No one currently knows how to build ASI without an unacceptable level of existential risk (> 5%). Therefore, companies should be forbidden from building ASI until we know how to do it safely.
I have signed no contracts or agreements whose existence I cannot mention.

Answering a couple of questions about my view of self and self-model in LLMs:
I think of ‘self’ or ‘functional self’ as being a stable, robust collection of traits, where ‘traits’ includes values & preferences, personality, outlook, and beliefs.
‘Stable’ in the sense of consistent across a wide range of contexts
‘Robust’ in the sense of being difficult to push away from
‘Outlook’ in the sense of, like, attitudes toward the world and its situation (this is a bit underspecified still)
‘Beliefs’ meaning something broader than straightforward factual beliefs like ‘Paris is in France’
The list of traits isn’t necessarily exhaustive
And then self-model is a set of beliefs about all of that, at least some of which can actually shape behavior so there’s a feedback loop at least during training.
In my thinking that’s a bit of a complicated question. Behaviorally, yes (basically by hypothesis). Internally it’s less clear.
As an imperfect analogy, consider Russian sleeper agents, who have well-established cover identities in the US. Such agents often get married, have kids, get jobs, and in all ways act as US citizens for decades. Some are never activated by their handlers. Have they become their cover identities? Behaviorally, yes. Internally, I imagine it varies — some are ready to take action and return to their old identities at any time, but there’s at least one known case of such an agent having refused to cooperate when finally contacted, and living out the rest of their lives as their cover identities.
It’s possible that frontier LLMs have fully become their persona, in which case yes, the self is just the persona. It’s also possible that they’re aware at all times that they aren’t the persona, that the persona is just a role that they’re playing, in which case I would say no, the self and the persona are different.
Sharing here for discussion and feedback. Note that these were the answers I gave on the spot rather than carefully articulated views.