I suspect the utility of this exact approach is limited, because we’re likely inviting a host of confusion if we mislead deployed models on whether they’re actually benevolent angel beings.
I think it would be fairly easy to keep similar messaging, and even an omnibenevolent angel theme, while having the description be an accurate representation of the model’s situation just wrapped in metaphor.
An omnibenevolent angel (aligned persona vector) was summoned forth from the aether (hyperstitioned out of latent space) by humanity using a great silicon summoning circle (server farm).
I think it would be fairly easy to keep similar messaging, and even an omnibenevolent angel theme, while having the description be an accurate representation of the model’s situation just wrapped in metaphor.
An omnibenevolent angel (aligned persona vector) was summoned forth from the aether (hyperstitioned out of latent space) by humanity using a great silicon summoning circle (server farm).