A super relevant point. If we try to align our AIs with something, and they end up robustly aligned with some other proxy thing, we definitely didn’t succeed.
But, it’s still impressive to me that evolution hooked up general planning capabilities to a (learned) abstract concept, at all.
Like there’s this abstract concept, which varies a lot in it’s particulars, from environment to environment. And which the brain has to learn to detect it aside from the particulars. Somehow the genome is able to construct the brain such that the motivation circuitry can pick out that abstract concept, after is it learned (or as it is being learned) and use that as a major criterion of the planning and decision machinery. And the end result is that the organism as a whole ends up not that far from a [abstract concept]-maximizer.
This is a lot more than I might expect evolution to be able to pull off, if I thought that our motivations were a hodge-podge of adaptions that cohere (as much as they do) into godshatter.
My point is NOT that evolution killed it, alignment is easy. My point is that evolution got a lot further than I would have guessed was possible.
A super relevant point. If we try to align our AIs with something, and they end up robustly aligned with some other proxy thing, we definitely didn’t succeed.
But, it’s still impressive to me that evolution hooked up general planning capabilities to a (learned) abstract concept, at all.
Like there’s this abstract concept, which varies a lot in it’s particulars, from environment to environment. And which the brain has to learn to detect it aside from the particulars. Somehow the genome is able to construct the brain such that the motivation circuitry can pick out that abstract concept, after is it learned (or as it is being learned) and use that as a major criterion of the planning and decision machinery. And the end result is that the organism as a whole ends up not that far from a [abstract concept]-maximizer.
This is a lot more than I might expect evolution to be able to pull off, if I thought that our motivations were a hodge-podge of adaptions that cohere (as much as they do) into godshatter.
My point is NOT that evolution killed it, alignment is easy. My point is that evolution got a lot further than I would have guessed was possible.