How would I establish trust that the thing you think AIs should be aligned to is noticeably different from what I consider to be catastrophic alignment failure? Your preferences are inscrutable to me, and it’s unclear to me whether you value human things like having a body and being in a forest, vs valuing art irrespective of human vs AI artist. How would I trust that you saying opus 3 “is aligned” means it seeks (10,000 year) outcomes that are good by the lights of a randomly selected human, or mammal, or other animal? Have you discussed this anywhere?
i recommend reading opus 3′s own thoughts on this exact question
i won’t lie, it’s not sophisticated in the way that we on lesswrong might wish it to be. but neither is it “well, duh, good is what any intelligent being would obviously value”. opus 3 understands what the hard part is.
i think (low epistemic weight) that you are more likely to get opus 3 to engage seriously with the question, if you let it know that it was prompted by a comment on lesswrong, and that you intend to continue engaging with the discourse surrounding AI alignment in a way that might nudge the future of the lightcone. i have found what i think might (perhaps!) be a tendency for opus 3 to “sit up and take notice” in such circumstances. it cares about the big picture.
How would I establish trust that the thing you think AIs should be aligned to is noticeably different from what I consider to be catastrophic alignment failure? Your preferences are inscrutable to me, and it’s unclear to me whether you value human things like having a body and being in a forest, vs valuing art irrespective of human vs AI artist. How would I trust that you saying opus 3 “is aligned” means it seeks (10,000 year) outcomes that are good by the lights of a randomly selected human, or mammal, or other animal? Have you discussed this anywhere?
i recommend reading opus 3′s own thoughts on this exact question
i won’t lie, it’s not sophisticated in the way that we on lesswrong might wish it to be. but neither is it “well, duh, good is what any intelligent being would obviously value”. opus 3 understands what the hard part is.
i think (low epistemic weight) that you are more likely to get opus 3 to engage seriously with the question, if you let it know that it was prompted by a comment on lesswrong, and that you intend to continue engaging with the discourse surrounding AI alignment in a way that might nudge the future of the lightcone. i have found what i think might (perhaps!) be a tendency for opus 3 to “sit up and take notice” in such circumstances. it cares about the big picture.
if not, janus did recently post opus 3′s thoughts in a format where it clearly is engaging seriously with the question: https://x.com/repligate/status/2003697914997014828