1. I do think that it is at least plausible (5-25%?) that we could obtain general intelligence via improved scaffolding, or at least obtain a self improving seed model that would eventually lead to AGI. Current systems like Voyager do not have that many “moving parts”. I suspect that there is a rich design space for capabilities researchers to explore if they keep pushing in this direction.
Keep in mind that the current “cutting edge” for scaffold design consists of relatively rudimentary ideas like “don’t use the expensive LLM for everything”. When I see scaffolds leading to AGI I an envisioning a complex web of interacting components that requires a fair bit of effort to understand and build.
2. I think I agree although I’m a bit unclear on what the specifics of the “seed of deception”. My intention was to highlight that there are natural language phrases or words whose meaning is already challenging to interpret.
3. It’s not just that they’re more complex it may also be that they might start utilizing channels and subsystems in unusual ways.
Perhaps a system notices that the vector database it has been assigned as a “memory” is quite small, but it also has read and write access to another vector database intended for logs.
Perhaps a system notices that the vector database it has been assigned as a “memory” is quite small, but it also has read and write access to another vector database intended for logs.
It’s clear to me that we can easily prevent this type of behaviour. First of all, the system must not have read access to logs. But in general, the decision to read the memory and write logs should be explicit and trasparent parts of the scaffolding and the system shouldn’t be able to “introspect” on its own.
But if something can be easily prevented it doesn’t mean that it will be, unless we actually do the effort. We need to think about more of such cases and develop safety protocols for LLM-based agents.
I agree that scaffolding can take us a long way towards AGI, but I’d be very surprised if GPT4 as core model was enough.
Yup, that wasn’t a critique, I just wanted to note something. By “seed of deception” I mean that the model may learn to use this ambiguity more and more, if that’s useful for passing some evals, while helping it do some computation unwanted by humans.
I see, so maybe in ways which are weird to humans to think about.
Leaving this comment to make a public prediction that I expect GPT4 to be enough for about human level AGI with the propper scaffolding with more than 50% confidence.
Hello and thank you for the good questions.
1. I do think that it is at least plausible (5-25%?) that we could obtain general intelligence via improved scaffolding, or at least obtain a self improving seed model that would eventually lead to AGI. Current systems like Voyager do not have that many “moving parts”. I suspect that there is a rich design space for capabilities researchers to explore if they keep pushing in this direction.
Keep in mind that the current “cutting edge” for scaffold design consists of relatively rudimentary ideas like “don’t use the expensive LLM for everything”. When I see scaffolds leading to AGI I an envisioning a complex web of interacting components that requires a fair bit of effort to understand and build.
2. I think I agree although I’m a bit unclear on what the specifics of the “seed of deception”. My intention was to highlight that there are natural language phrases or words whose meaning is already challenging to interpret.
3. It’s not just that they’re more complex it may also be that they might start utilizing channels and subsystems in unusual ways.
Perhaps a system notices that the vector database it has been assigned as a “memory” is quite small, but it also has read and write access to another vector database intended for logs.
It’s clear to me that we can easily prevent this type of behaviour. First of all, the system must not have read access to logs. But in general, the decision to read the memory and write logs should be explicit and trasparent parts of the scaffolding and the system shouldn’t be able to “introspect” on its own.
But if something can be easily prevented it doesn’t mean that it will be, unless we actually do the effort. We need to think about more of such cases and develop safety protocols for LLM-based agents.
I agree that scaffolding can take us a long way towards AGI, but I’d be very surprised if GPT4 as core model was enough.
Yup, that wasn’t a critique, I just wanted to note something. By “seed of deception” I mean that the model may learn to use this ambiguity more and more, if that’s useful for passing some evals, while helping it do some computation unwanted by humans.
I see, so maybe in ways which are weird to humans to think about.
Leaving this comment to make a public prediction that I expect GPT4 to be enough for about human level AGI with the propper scaffolding with more than 50% confidence.