The comments on my recent post about formalizing the inner alignment problem are, like, the best comments I’ve ever gotten. Seems like begging for comments at length works?
This is making me feel optimistic about a coordinated attack on the formal inner alignment problem. Once we “dig out” the right formal space, it seems like there’ll be a lot of actually tractable questions which a team of people can attack. I feel like this is only currently happening to a limited extent, perhaps surprisingly… eg: why aren’t there several people working on the minimal circuits stuff? Is it just too hard, even though the question has been made relatively concrete? I feel optimistic because of the quick and in-depth responses. My model is that a better overarching picture of the problem and current solution approaches will help people orient toward the problem and toward fruitful directions. Maybe this isn’t really a thing (based on what little happened with minimal circuits)?
I was talking with Ramana last week about the overall chances of making AI go well, and what needs to be done, and we both sorta surprised ourselves with how much the conclusion seemed to be “More work on inner alignment ASAP.” Then again I’m biased since that’s what I’m doing this month.
It’s something we need in order to do anything else, and of things like that, it seems near/at the bottom of my list if sorted by probability of the research community figuring it out.
The comments on my recent post about formalizing the inner alignment problem are, like, the best comments I’ve ever gotten. Seems like begging for comments at length works?
This is making me feel optimistic about a coordinated attack on the formal inner alignment problem. Once we “dig out” the right formal space, it seems like there’ll be a lot of actually tractable questions which a team of people can attack. I feel like this is only currently happening to a limited extent, perhaps surprisingly… eg: why aren’t there several people working on the minimal circuits stuff? Is it just too hard, even though the question has been made relatively concrete? I feel optimistic because of the quick and in-depth responses. My model is that a better overarching picture of the problem and current solution approaches will help people orient toward the problem and toward fruitful directions. Maybe this isn’t really a thing (based on what little happened with minimal circuits)?
I was talking with Ramana last week about the overall chances of making AI go well, and what needs to be done, and we both sorta surprised ourselves with how much the conclusion seemed to be “More work on inner alignment ASAP.” Then again I’m biased since that’s what I’m doing this month.
It’s something we need in order to do anything else, and of things like that, it seems near/at the bottom of my list if sorted by probability of the research community figuring it out.