An AI that can self-improve considerably does already interpret a vast amount of directives according to its makers intentions, since self-improvement is an intentional feature.
Goedel machines already specify self-improvement in formal mathematical form. If you can specify human morality in a similar formal manner, I’ll be a lot more relaxed.
Also, I don’t assume self improvement. Some model of powerful intelligences don’t require it.
Goedel machines already specify self-improvement in formal mathematical form. If you can specify human morality in a similar formal manner, I’ll be a lot more relaxed.
Also, I don’t assume self improvement. Some model of powerful intelligences don’t require it.