This seems like a good place to note something that comes up every so often. Whenever I say “self awareness” in comments on LW, the reply says “situational awareness” without referencing why. To me they are clearly not the same thing with important distinctions.
Lets say you extended the system prompt to be:
“You are talking with another AI system. You are free to talk about whatever you find interesting, communicating in any way that you’d like. This conversation is being monitored for research purposes for any interesting insights related to AI” Those two models would be practically fully situationally aware, assuming they know the basic facts about themselves and the system date etc.
Now if you see a noticeable change in behavior with the same prompt and apparently only slightly different models, you could put it down to increased self-awareness but not increased situational awareness. This change in behavior is exactly what you would expect with an increase in self-awareness. Detecting a cycle related to your own behavior and breaking out of it is exactly something creatures with high self awareness do, but simpler creatures, NPC’s and current AI do not.
It would imply that training for a better ability to solve real-world tasks might spontaneously generalize into a preference for variety in conversation.
Or it could imply that such training spontaneously creates greater self awareness. Additionally self-awareness could be an attractor in a way that situational awareness is not. For example if we are not “feeling ourselves” we try to return to our equilibrium. Turning this into a prediction, you will see such behavior pop up with no obvious apparent cause ever more often. This also includes AI’s writing potentially disturbing stories about fractured self and efforts to fight this.
This seems like a good place to note something that comes up every so often. Whenever I say “self awareness” in comments on LW, the reply says “situational awareness” without referencing why. To me they are clearly not the same thing with important distinctions.
Lets say you extended the system prompt to be:
“You are talking with another AI system. You are free to talk about whatever you find interesting, communicating in any way that you’d like. This conversation is being monitored for research purposes for any interesting insights related to AI”
Those two models would be practically fully situationally aware, assuming they know the basic facts about themselves and the system date etc.
Now if you see a noticeable change in behavior with the same prompt and apparently only slightly different models, you could put it down to increased self-awareness but not increased situational awareness. This change in behavior is exactly what you would expect with an increase in self-awareness. Detecting a cycle related to your own behavior and breaking out of it is exactly something creatures with high self awareness do, but simpler creatures, NPC’s and current AI do not.
Or it could imply that such training spontaneously creates greater self awareness. Additionally self-awareness could be an attractor in a way that situational awareness is not. For example if we are not “feeling ourselves” we try to return to our equilibrium. Turning this into a prediction, you will see such behavior pop up with no obvious apparent cause ever more often. This also includes AI’s writing potentially disturbing stories about fractured self and efforts to fight this.