Jacob Watts comments on Why would AI companies use human-level AI to do alignment research?

Jacob Watts 27 Apr 2025 11:51 UTC
2 points
0
I think something like “alignment features” are plausibly a huge part of the story for why AI goes well.

At least, I think it is refreshing to take the x-risk goggles off for a second sometimes and remember that there is actually a huge business incentive to eg. solve “indirect prompt injections”, perfect robust AI decision making in high stakes contexts, or find the holy grail of compute-scalable oversight.

Like, a lot of times there seems genuine ambiguity and overlap b/w “safety” research and normal AI research. The clean “capabilities”/”alignment” distinction is more map than territory sometimes.

Also, isn’t this already basically a thing?? Companies already compete to have the “special sauce”; a lot of this is post-training stuff, so massively overlaps with “alignment-coded” stuff. When does the RL post training go from being safety to “special sauce” to “alignment feature” y’know?
- Sergii 27 Apr 2025 19:12 UTC
  1 point
  1
  Parent
  unfortunately, AI research is commercialized and is heavily skewed by capitalist market needs,
  so it’s still going to be all in for tryinng to make an “AI office worker”, safety be damned, until this effors hit wome wall, which I think is still plausible.