I have extensively experimented with concepts similar to this myself. From stuff like using TinyStyler to make LLM outputs more legible to me by making them more similar to my own writer, to trying to finetune LLMs to match my own behavior. The results are always extremely biased. There is simply no way to separate the goal of an LLM “matching your own desires and goals” and it just being extremely sycophantic and misaligned with you.
One hypothetical: Imagine your agent sees a project in your computer and deletes it because it predicted you weren’t going to finish it anyways and you needed the storage space anyways. If the models goal is to maximize agreement with my expressed preferences, surely this is a bad action because I wanted the project in my computer anyways. Or imagine a situation where it blocks your internet access past 8PM because it realizes you probably would’ve done that yourself anyways.
And sure, you can say, ok maybe let the Guardian Angel figure out what actions are acceptable for it to make and what not and maybe it’ll make these decisions with the people who need it and the people who want it. The main thing that struck me is that this approach just multiplies the risk factor of misalignment. A personalized model is basically a multiplicative factor for alignment problems. Either you get a model that maximizes your personal happiness (with a huge cost in other areas due to Pareto) or a model that maximizes your productivity and agency with the same tradeoffs. And even if the model perfectly aligns with your own goals, it disempowers you by making you by opening the door to interpassivity, which is a concept outlined by Slavoj Žižek.
As a disclaimer, I don’t think overall that the concept of more personalized agents and models is bad in and of itself, but it’s not a robust solution for many reasons. I think eventually models will gain these capabilities anyways, since I believe LLMs can recover way more information from written text than humans already, and it’s not outlandish to think models could gain these capabilities osmotically like they’ve been doing for a few years now.
So I think my conclusion, is that creating these types of siamesian adjuncts to language models creates a whole problem where the assistant needs to commit to a specific definition of personal identity, autonomy, and how the preferences of people evolve over time, make decisions for the user, and overall, probably accelerate gradual disempowerment as a side effect.
The Enneagram has less predictive validity than the system you’re criticizing. If you think personality theories flatten mind-space into flavor variations, then the Enneagram sadly is just the same typology without using personal traits as the discriminating factor. Minds tends to differ architecturally, not typologically.
We all have different representational formats, different processing modes, different gating functions on identical inputs (which has something to do with genetics and receptor expression). Types can’t capture that because types are static and minds are generative.