One extreme is that we can find a way such that there is no tradeoff whatsoever between safety and capabilities—an “alignment tax” of 0%.
I think was the idea behind ‘oracle ai’s’. (Though I’m aware there were arguments against that approach.)
One of the arguments I didn’t see for
sorting out the practical details of how to implement them:
was:
“As we get better at this alignment stuff we will reduce the ‘tradeoff’. (Also, arguably, getting better human feedback improves performance.)
I think was the idea behind ‘oracle ai’s’. (Though I’m aware there were arguments against that approach.)
One of the arguments I didn’t see for
was:
“As we get better at this alignment stuff we will reduce the ‘tradeoff’. (Also, arguably, getting better human feedback improves performance.)