There’s such thing as over-fitting… if you have some noisy data, the theory that fits the data ideally is just the table of the data (e.g. heights and falling times); the useful theory doesn’t fit data exactly in practice. If we make the AI perfectly fit to what mankind does, we could just as well make a brick and proclaim it an omnipotent omniscient mankind-friendly AI that will never stop the mankind from doing something that mankind wants (including taking the extinction risks).
Well yes, but I would assume you would want more alignment, not less.
There’s such thing as over-fitting… if you have some noisy data, the theory that fits the data ideally is just the table of the data (e.g. heights and falling times); the useful theory doesn’t fit data exactly in practice. If we make the AI perfectly fit to what mankind does, we could just as well make a brick and proclaim it an omnipotent omniscient mankind-friendly AI that will never stop the mankind from doing something that mankind wants (including taking the extinction risks).