I think something like “alignment features” are plausibly a huge part of the story for why AI goes well.
At least, I think it is refreshing to take the x-risk goggles off for a second sometimes and remember that there is actually a huge business incentive to eg. solve “indirect prompt injections”, perfect robust AI decision making in high stakes contexts, or find the holy grail of compute-scalable oversight.
Like, a lot of times there seems genuine ambiguity and overlap b/w “safety” research and normal AI research. The clean “capabilities”/”alignment” distinction is more map than territory sometimes.
Also, isn’t this already basically a thing?? Companies already compete to have the “special sauce”; a lot of this is post-training stuff, so massively overlaps with “alignment-coded” stuff. When does the RL post training go from being safety to “special sauce” to “alignment feature” y’know?
unfortunately, AI research is commercialized and is heavily skewed by capitalist market needs,
so it’s still going to be all in for tryinng to make an “AI office worker”, safety be damned, until this effors hit wome wall, which I think is still plausible.
I think something like “alignment features” are plausibly a huge part of the story for why AI goes well.
At least, I think it is refreshing to take the x-risk goggles off for a second sometimes and remember that there is actually a huge business incentive to eg. solve “indirect prompt injections”, perfect robust AI decision making in high stakes contexts, or find the holy grail of compute-scalable oversight.
Like, a lot of times there seems genuine ambiguity and overlap b/w “safety” research and normal AI research. The clean “capabilities”/”alignment” distinction is more map than territory sometimes.
Also, isn’t this already basically a thing?? Companies already compete to have the “special sauce”; a lot of this is post-training stuff, so massively overlaps with “alignment-coded” stuff. When does the RL post training go from being safety to “special sauce” to “alignment feature” y’know?
unfortunately, AI research is commercialized and is heavily skewed by capitalist market needs,
so it’s still going to be all in for tryinng to make an “AI office worker”, safety be damned, until this effors hit wome wall, which I think is still plausible.