Noosphere89 comments on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.)

Noosphere89 15 Oct 2023 20:01 UTC
2 points
−3
I definitely agree that I think tabooing white box vs black box is good. One point though is that the innate reward system does targeted updates to neural circuits using simple learning rules, that means that we can probably use SGD to make ourselves an innate reward system combined with a weak prior to get good results.

Admittedly, I do thnk that the pathway isn’t as complete as I like, but I do actually think that the notion of seeing the weights, checking the Hessians, etc to be extremely powerful alignment tools, more powerful than appreciated.