are there any online demos of instrument convergence?
there’s been compelling writing… but are there any experiments that show agents which are given specific goals then realize there are more general goals they need to persistently pursue in order to achieve the more specific goals?
I imagine a compelling simple demo here might be necessary to shock the AI safety community out of the belief that we can maintain control of autonomous digital agents (ADAs).
are there any online demos of instrument convergence?
there’s been compelling writing… but are there any experiments that show agents which are given specific goals then realize there are more general goals they need to persistently pursue in order to achieve the more specific goals?
I imagine a compelling simple demo here might be necessary to shock the AI safety community out of the belief that we can maintain control of autonomous digital agents (ADAs).