Pattern comments on Buck’s Shortform

Pattern 25 Aug 2021 15:25 UTC
2 points
0
One extreme is that we can find a way such that there is no tradeoff whatsoever between safety and capabilities—an “alignment tax” of 0%.
I think was the idea behind ‘oracle ai’s’. (Though I’m aware there were arguments against that approach.)

One of the arguments I didn’t see for
sorting out the practical details of how to implement them:
was:
“As we get better at this alignment stuff we will reduce the ‘tradeoff’. (Also, arguably, getting better human feedback improves performance.)