Great post, I think it’s very complimentary my last post, where I argue that what LLMs can and can’t do is strongly affected by the modes of input they have access to.
I think overall this updates me towards thinking there’s a load of progress which will be made in AI literally just from giving it access to data in a nicer format.
Yeah I agree with a lot of that. One weird thing is that LLM’s learn patterns differently than we do, so while a human can learn a lot faster by “controlling the video camera” (being embodied), it’s a separate unsolved problem to make LLM’s seek out the right training data to improve themselves. An even simpler unsolved problem is just having an LLM tell you what text would help it train best.
Great post, I think it’s very complimentary my last post, where I argue that what LLMs can and can’t do is strongly affected by the modes of input they have access to.
I think overall this updates me towards thinking there’s a load of progress which will be made in AI literally just from giving it access to data in a nicer format.
Yeah I agree with a lot of that. One weird thing is that LLM’s learn patterns differently than we do, so while a human can learn a lot faster by “controlling the video camera” (being embodied), it’s a separate unsolved problem to make LLM’s seek out the right training data to improve themselves. An even simpler unsolved problem is just having an LLM tell you what text would help it train best.