Studying failures is useful because they highlight non-obvious internal mechanism, while successes are usually about thing working as intended and therefore not requiring explanation.
Another problem is that we don’t have examples of successes, because every measureable alignment success can be a failure in disguise.
Studying failures is useful because they highlight non-obvious internal mechanism, while successes are usually about thing working as intended and therefore not requiring explanation.
Another problem is that we don’t have examples of successes, because every measureable alignment success can be a failure in disguise.