I definitely agree that I think tabooing white box vs black box is good. One point though is that the innate reward system does targeted updates to neural circuits using simple learning rules, that means that we can probably use SGD to make ourselves an innate reward system combined with a weak prior to get good results.
Admittedly, I do thnk that the pathway isn’t as complete as I like, but I do actually think that the notion of seeing the weights, checking the Hessians, etc to be extremely powerful alignment tools, more powerful than appreciated.
I definitely agree that I think tabooing white box vs black box is good. One point though is that the innate reward system does targeted updates to neural circuits using simple learning rules, that means that we can probably use SGD to make ourselves an innate reward system combined with a weak prior to get good results.
Admittedly, I do thnk that the pathway isn’t as complete as I like, but I do actually think that the notion of seeing the weights, checking the Hessians, etc to be extremely powerful alignment tools, more powerful than appreciated.