Joe Carlsmith comments on Can we safely automate alignment research?

Joe Carlsmith 2 May 2025 0:28 UTC
LW: 6 AF: 4
4
AF
I’m happy to say that easy-to-verify vs. hard-to-verify is what ultimately matters, but I think it’s important to be clear what about makes something easier vs. harder to verify, so that we can be clear about why alignment might or might not be harder than other domains. And imo empirical feedback loops and formal methods are amongst the most important factors there.