lberglund comments on On how various plans miss the hard bits of the alignment challenge

lberglund 12 Jul 2022 16:48 UTC
5 points
1
I was a bit confused about this quote, so I tried to expand on the ideas a bit. I’m posting it here in case anyone benefits from is or disagrees.
To which I say: I expect many of the cognitive gains to come from elsewhere, much as a huge number of the modern capabilities of humans are encoded in their culture and their textbooks rather than in their genomes. Because there are slopes in capabilities-space that an intelligence can snowball down, picking up lots of cognitive gains, but not alignment, along the way.
I guess saying is saying that an AI will develop a way to learn things without gradient descent, just like humans learned things outside of our genetic update. Some ways to do this would be
- Develop the ability to read things on the internet and learn from them
- Spend cognitive energy on things like doing math or programming
- Do things to actually gain power in the world, like accumulating money or compute
I guess the argument is that, for objectives, only gradient descent is pushing you in the correct direction, whereas for capabilities, the system will develop ways to push itself in the right direction in addition to SGD. Like, it’s true that for any objective function its good to be more powerful.;It’s not true that for any level of power the system is incentivized to have the more correct objective.
A system wants to be more powerful, but it doesn’t want to have a more “correct” objective.