Situational awareness is a spectrum --> One important implication that I hadn’t considered before is the challenge of choosing a threshold or shelling point beyond which a model becomes significantly dangerous. This may have a lot of consequences in OpenAI’s Plan: Setting the threshold above which we should stop deployment seems very tricky, and this is not discussed in their plan.
The potential decomposition of situational awareness is also an intriguing idea. I would love to see a more detailed exploration of this. This would be the kind of things that would be very helpful to develop. Is anyone working on this?
Hello! I recently finished a draft on a version of RL that maybe able to streamline an LLM’s situational awareness and match our world models. If you are interested send me a message.=)
Thanks for writing this!
Situational awareness is a spectrum --> One important implication that I hadn’t considered before is the challenge of choosing a threshold or shelling point beyond which a model becomes significantly dangerous. This may have a lot of consequences in OpenAI’s Plan: Setting the threshold above which we should stop deployment seems very tricky, and this is not discussed in their plan.
The potential decomposition of situational awareness is also an intriguing idea. I would love to see a more detailed exploration of this. This would be the kind of things that would be very helpful to develop. Is anyone working on this?
Hello! I recently finished a draft on a version of RL that maybe able to streamline an LLM’s situational awareness and match our world models. If you are interested send me a message.=)