I am confused by the problem statement. What you’re asking for is a generic tool, something that doesn’t need information about the world to be created, but that I can then feed information about the real world and it will become very useful.
I cannot rely on “don’t worry if the Task AI is not aligned, we’ll just feed it harmless problems”, the risk comes from what the AI will do to get to the solution. If the problem is hard and you want to defer the search to a tool powerful enough that you have to choose carefully your inputs or catastrophe happens, you don’t want to build that tool.
I think my idea was to restrict the number of compute cycles being given to the Task AI. Given enough compute, the AI will eventually end up spending some compute to learn the existence of the real world, but if the AI is naturally much better suited to math theorems than this, it might directly solve the math problem first.
If your agent isn’t aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn’t hit the limit with its standard search, you’re in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you’re in case 1b and it apparently fails safely you have an incentive to just increase the limit.
If your agent is indeed aware of the constraint, then it has an incentive to remove it, or increase the limit by other means. Three cases here again: (2a) identical to 1a, you’re in luck; (2b) the limit is low enough that strategic action to remove the constraint is impossible, the agent fails “safely”; (3b) the agent finds a way to remove the constraint, and you’re in very unsafe territory.
Two observations from there: first, ideally you’d want your agent to operate safely even if given unbounded cycles, that’s the Omni Test. Second, there’s indeed an alignment concept for agents that just try to solve the problem without long-term planning, that’s Myopia (and defining it formally is… hard).
I agree in the world today, attempting to get AI that can do this is dangerous, better off slowing down AI race. What I’m suggesting is more as a precaution for a world where the AI race has gotten to the point where this AI can be built.
I am confused by the problem statement. What you’re asking for is a generic tool, something that doesn’t need information about the world to be created, but that I can then feed information about the real world and it will become very useful.
My problem is that the real world is rich, and feeding the tool with all relevant information will be expensive, and the more complicated the math problem is, the more safety issues you get.
I cannot rely on “don’t worry if the Task AI is not aligned, we’ll just feed it harmless problems”, the risk comes from what the AI will do to get to the solution. If the problem is hard and you want to defer the search to a tool powerful enough that you have to choose carefully your inputs or catastrophe happens, you don’t want to build that tool.
Depending on the current world state (say someone else is close to discovering AGI too), it might be a viable option to have I guess.
Thanks for replying!
I think my idea was to restrict the number of compute cycles being given to the Task AI. Given enough compute, the AI will eventually end up spending some compute to learn the existence of the real world, but if the AI is naturally much better suited to math theorems than this, it might directly solve the math problem first.
I’m keen to know if that would work.
So, assuming an unaligned agent here.
If your agent isn’t aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn’t hit the limit with its standard search, you’re in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you’re in case 1b and it apparently fails safely you have an incentive to just increase the limit.
If your agent is indeed aware of the constraint, then it has an incentive to remove it, or increase the limit by other means. Three cases here again: (2a) identical to 1a, you’re in luck; (2b) the limit is low enough that strategic action to remove the constraint is impossible, the agent fails “safely”; (3b) the agent finds a way to remove the constraint, and you’re in very unsafe territory.
Two observations from there: first, ideally you’d want your agent to operate safely even if given unbounded cycles, that’s the Omni Test. Second, there’s indeed an alignment concept for agents that just try to solve the problem without long-term planning, that’s Myopia (and defining it formally is… hard).
Thank you for these links, I will read them.
I agree in the world today, attempting to get AI that can do this is dangerous, better off slowing down AI race. What I’m suggesting is more as a precaution for a world where the AI race has gotten to the point where this AI can be built.