Have you thought about having the AI navigate stories/scenarios/environments in a CYOA fashion? It could involve picking between positive options and eventually opportunities to choose good options even when bad ones are easy or there is even strong pressure to choose them. Perhaps taking some inspiration from the kind of strategy used in Recontextualization Mitigates Specification Gaming https://arxiv.org/abs/2512.19027
Have you thought about having the AI navigate stories/scenarios/environments in a CYOA fashion? It could involve picking between positive options and eventually opportunities to choose good options even when bad ones are easy or there is even strong pressure to choose them. Perhaps taking some inspiration from the kind of strategy used in Recontextualization Mitigates Specification Gaming https://arxiv.org/abs/2512.19027