if we can get a guarantee, it’ll also include guarantees about grammar and syntax. doesn’t seem like too much to ask, might have been too much to ask to do it before the model worked at all, but SLT seems on track to give a foothold from which to get a guarantee. might need to get frontier AIs to help with figuring out how to nail down the guarantee, which would mean knowing what to ask for, but we may be able to be dramatically more demanding with what we ask for out of a guarantee-based approach than previous guarantee-based approaches, precisely because we can get frontier AIs to help out, if we know what bound we want to find.
My point was that even though we already have an extremely reliable recipe for getting an LLM to understand grammar and syntax, we are not anywhere near a theoretical guarantee for that. The ask for a theoretical guarantee seems impossible to me, even on much easier things that we already know modern AI can do.
When someone asks for an alignment guarantee I’d like them to demonstrate what they mean by showing a guarantee for some simpler thing—like a syntax guarantee for LLMs. I’m not familiar with SLT but I’ll believe it when I see it.
if we can get a guarantee, it’ll also include guarantees about grammar and syntax. doesn’t seem like too much to ask, might have been too much to ask to do it before the model worked at all, but SLT seems on track to give a foothold from which to get a guarantee. might need to get frontier AIs to help with figuring out how to nail down the guarantee, which would mean knowing what to ask for, but we may be able to be dramatically more demanding with what we ask for out of a guarantee-based approach than previous guarantee-based approaches, precisely because we can get frontier AIs to help out, if we know what bound we want to find.
My point was that even though we already have an extremely reliable recipe for getting an LLM to understand grammar and syntax, we are not anywhere near a theoretical guarantee for that. The ask for a theoretical guarantee seems impossible to me, even on much easier things that we already know modern AI can do.
When someone asks for an alignment guarantee I’d like them to demonstrate what they mean by showing a guarantee for some simpler thing—like a syntax guarantee for LLMs. I’m not familiar with SLT but I’ll believe it when I see it.