It is clear that in the limit LLM’s are superhumanly good predictors. (Ie Solomonov induction on text). It is less clear whether or not neural networks can get anywhere near that good. However, it is less clear whether this is dangerous. Suppose you ask the LLM about some physics experiment that hasn’t been done yet. It uses it’s superhuman cognition to work out the true laws of physics, and then writes what humans would say, given the experimental results. This is smart but not dangerous. (It could be mindcrime) The LLM could be dangerous, if it predicts the output of a superintelligence. But it only goes there if it has really high generalization, ie it is capable of ignoring the fact that superintelligences don’t exist while being smart enough to predict one. I am unsure how likely this is.
I strongly disagree with your statement here Donald. I think that the level of capability you describe here as ‘not dangerous’ is what I would describe as ‘extremely dangerous’. An AI agent which has super-human capabilities but restricts itself to human-level outputs because of the quirks of its training process can still accomplish everything necessary to destroy humanity. The key limiting factor in your example is not the model’s capability but rather its agency.
Yeah, used carefully and intentionally by well-intentioned actors (not reckless or criminal or suicidal terrorists or...) and no big deal surprises… And no rapid further advances building off of where we’ve gotten so far… If all of those things were somehow true, then yeah, much less dangerous.
It is clear that in the limit LLM’s are superhumanly good predictors. (Ie Solomonov induction on text). It is less clear whether or not neural networks can get anywhere near that good. However, it is less clear whether this is dangerous. Suppose you ask the LLM about some physics experiment that hasn’t been done yet. It uses it’s superhuman cognition to work out the true laws of physics, and then writes what humans would say, given the experimental results. This is smart but not dangerous. (It could be mindcrime) The LLM could be dangerous, if it predicts the output of a superintelligence. But it only goes there if it has really high generalization, ie it is capable of ignoring the fact that superintelligences don’t exist while being smart enough to predict one. I am unsure how likely this is.
I strongly disagree with your statement here Donald. I think that the level of capability you describe here as ‘not dangerous’ is what I would describe as ‘extremely dangerous’. An AI agent which has super-human capabilities but restricts itself to human-level outputs because of the quirks of its training process can still accomplish everything necessary to destroy humanity. The key limiting factor in your example is not the model’s capability but rather its agency.
Ok, maybe my wording should be more like, “this probably wont destroy the world if it is used carefully and there are no extra phenomena we missed.”
Yeah, used carefully and intentionally by well-intentioned actors (not reckless or criminal or suicidal terrorists or...) and no big deal surprises… And no rapid further advances building off of where we’ve gotten so far… If all of those things were somehow true, then yeah, much less dangerous.
Sorry, by “dangerously capable” I meant “capable enough to be very dangerous” not “inherently very dangerous”.