I think you’re right that I’m reading into this. But there is probably more to his thinking, whether I’m right or wrong about what that is. Shane Legg was thinking about alignment as far back as his PhD thesis, which doesn’t go into depth on it but does show he’d at least read a some of the literature prior to 2008.
I agree that LLM chain of thought is not totally reliable, but I don’t think it makes sense to dismiss it as too unreliable to work with for an alignment solution. There’s so much that hasn’t been tried, both in making LLMs more reliable, and making agents built on top of them reliable by taking multiple paths, and using new context windows and different models to force them to break problems into steps, and use the last natural language statement as their whole context for the next step.
Whether or not this is a reliable path to alignment, it’s a potential path to huge profits. So there are two questions: will this lead to alignable AGI? And, will it lead to AGI. I think both are unanswered.
I think you’re right that I’m reading into this. But there is probably more to his thinking, whether I’m right or wrong about what that is. Shane Legg was thinking about alignment as far back as his PhD thesis, which doesn’t go into depth on it but does show he’d at least read a some of the literature prior to 2008.
I agree that LLM chain of thought is not totally reliable, but I don’t think it makes sense to dismiss it as too unreliable to work with for an alignment solution. There’s so much that hasn’t been tried, both in making LLMs more reliable, and making agents built on top of them reliable by taking multiple paths, and using new context windows and different models to force them to break problems into steps, and use the last natural language statement as their whole context for the next step.
Whether or not this is a reliable path to alignment, it’s a potential path to huge profits. So there are two questions: will this lead to alignable AGI? And, will it lead to AGI. I think both are unanswered.