Isn’t that only a subcase of friendliness testing?
Not really. If you have an AI where you’re not sure if it is completely broken or just unfriendly, you might want to test it, but without proper boxing you still risk destroying the world in the unlikely case that the AI works.
Isn’t that only a subcase of friendliness testing?
Not really. If you have an AI where you’re not sure if it is completely broken or just unfriendly, you might want to test it, but without proper boxing you still risk destroying the world in the unlikely case that the AI works.