Yudkowsky gives the example of strawberry duplication as an aspirational level of AI alignment—he thinks getting an AI to duplicate a strawberry at the cellular level would be challenging and an important alignment success if it was pulled off.
In my understanding, the rough reason why this task is maybe a good example for an alignment benchmark is “because duplicating strawberries is really hard”. There’s a basic idea here—that different tasks have different difficulties, and the difficulty of a task has AI risk implications—seems like it could be quite useful to better understanding and managing AI risk. Has anyone tried to look into this idea in more depth?
Yudkowsky gives the example of strawberry duplication as an aspirational level of AI alignment—he thinks getting an AI to duplicate a strawberry at the cellular level would be challenging and an important alignment success if it was pulled off.
In my understanding, the rough reason why this task is maybe a good example for an alignment benchmark is “because duplicating strawberries is really hard”. There’s a basic idea here—that different tasks have different difficulties, and the difficulty of a task has AI risk implications—seems like it could be quite useful to better understanding and managing AI risk. Has anyone tried to look into this idea in more depth?