we make the very strong assumption throughout that S-LLMs are a plausible and likely path to AGI
It sounds unlikely and unnecessarily strong to say that we can reach AGI by scaffolding alone (if that’s what you mean). But I think it’s pretty likely that AGI will involve some amount of scaffolding, and that it will boost its capabilities significantly.
there is a preexisting discrepancy between how humans would interpret phrases and how the base model will interpret them
To the extent that it’s true, I expect that it may also make deception easier to arise. This discrepancy may serve as a seed of deception.
Systems engaging in self modification will make the interpretation of their natural language data more challenging.
Why? Sure, they will get more complex, but are there any other reasons?
Also, I like the richness of your references in this post :)
I agree that scaffolding can take us a long way towards AGI, but I’d be very surprised if GPT4 as core model was enough.
Yup, that wasn’t a critique, I just wanted to note something. By “seed of deception” I mean that the model may learn to use this ambiguity more and more, if that’s useful for passing some evals, while helping it do some computation unwanted by humans.
I see, so maybe in ways which are weird to humans to think about.