what do you think of this for the alignment problem?
If we make an AI system that’s capable of making another, more capable AI system, which then makes another more capable AI and that makes another one and so on, how can we trust that that will result in AI systems that only do what we want and don’t do things that we don’t want?
what do you think of this for the alignment problem?
If we make an AI system that’s capable of making another, more capable AI system, which then makes another more capable AI and that makes another one and so on, how can we trust that that will result in AI systems that only do what we want and don’t do things that we don’t want?
That’s the Tiling Agents problem!