How would I establish trust that the thing you think AIs should be aligned to is noticeably different from what I consider to be catastrophic alignment failure? Your preferences are inscrutable to me, and it’s unclear to me whether you value human things like having a body and being in a forest, vs valuing art irrespective of human vs AI artist. How would I trust that you saying opus 3 “is aligned” means it seeks (10,000 year) outcomes that are good by the lights of a randomly selected human, or mammal, or other animal? Have you discussed this anywhere?
How would I establish trust that the thing you think AIs should be aligned to is noticeably different from what I consider to be catastrophic alignment failure? Your preferences are inscrutable to me, and it’s unclear to me whether you value human things like having a body and being in a forest, vs valuing art irrespective of human vs AI artist. How would I trust that you saying opus 3 “is aligned” means it seeks (10,000 year) outcomes that are good by the lights of a randomly selected human, or mammal, or other animal? Have you discussed this anywhere?