Donald Hobson comments on Two Stupid AI Alignment Ideas

Donald Hobson 18 Nov 2021 18:31 UTC
6 points
0
A couple more problems with extreme discounting.
In contexts where the AI is doing AI coding, it is only weekly conserved. Ie the original AI doesn’t care if it makes an AI that doesn’t have super high discount rates, so long as that AI does the right things in the first 5 minutes of being switched on.
The theoretical possibility of time travel.
Also, the strong incentive to pay in Parfits hitchhiker only exists if Parfit can reliably predict you. If humans have the ability to look at any AI code, and reliably predict what it will do, then alignment is a lot easier, you just don’t run any code you predict will do bad things.
Also FAI != enslaved AI.
In a successful FAI project, the AI has terminal goals carefully shaped by the programmers, and achieves those goals.
In a typical UFAI, the terminal goals are set by the random seed of network initialization, or arbitrary details in the training data.