I apologize if I’m coming off combative, I am genuinely appreciative for the help.
It’s hard for me to write well for an audience I don’t know well. I went through a number of iterations of this just trying to clarify the conceptual contours of such a research direction in a single post that’s clear and coherent. I have like 5 follow up posts planned, hopefully I’ll keep going. But the premise is “here’s a stack of like 10 things that we want the AI to do, if it does these things it will be aligned. Further, this is all rooted in language use and not in biology, which seems useful because AI is not biological.” Actually getting an AI to conform to those things is like a nightmarish challenge, but it seems useful to have a coherent conceptual framework that defines what alignment is exactly and can explain why those 10 things and not some others. My essential thesis in other words is that at a high level, reframing the alignment problem in Habermasian terms makes the problem appear tractable.
“The ISS doesn’t seem to include core ethics and goals”
This is actually untrue. Habermas’ contention is that literally all human values and universal norms derive from the ISS. I intend to walk through some of that argument in a future post. However, what’s important is that this is an approach that grounds all human values as such in language use. I think Habermas likely underrates biology in contributing to human values, however this is an advantage when thinking about aligning AI that operate around a core of being competent language users. Point being is that my contention is that a Habermasian AGI wouldn’t kill everyone. It might be hard to build, but in principle if you did build one, it would be aligned.
The basic intuition I have, which I think is correct, is that if you build a Habermasian robot, it won’t kill everyone. This is significant to my mind. Maybe it’s impossible, but it seems like an interesting thing to pursue.
I wrote all of the ideas, but yes I fed them into Gemini to create a version that was more clearly articulated.
Hey, cool! Yes, I agree that the ideal speech situation is not achievable. Thats what the “ideal” part is. However neither is next word prediction in principle. It’s an ideal that can be striven for.
I’m going to try and unpack the details of what I’m proposing in future posts, just wanted to introduce the idea here