It’s interesting that Facebook/Meta fell so far behind in AI despite the substantial resources on hand. ‘Metaverse’ was an inherently flawed idea that they thought they could make work through market leverage, but well-scoring LLMs have been done successfully by a wide variety of organizations, from Alibaba to X to OpenAI to Anthropic.
Is it something organizational? Does Facebook have any successful spinoff initiatives?
Looks like a neat idea, though things do sort of converge to vague positivity rather than actually getting anything done. You mention that the scaffolding is a sticking point, and I wonder if you wouldn’t get better results by just grabbing the top system prompts for similar tasks and hard-sticking them onto the models, maybe with a bit of manual prompt engineering, such that e.g. Gemini always sees a very explicit order to assume user error when something goes wrong.
The other issue is that open-ended real-world tasks are a bit unfair to give to production LLMs, on the basis that anything these LLMs could do on their own would’ve already been done by an enterprising human using the very same LLMs, but with the intent to succeed[1] rather than the intent to evaluate the model’s performance.
Characterized by more direct prompting, more frequent intervention, and a general willingness to ‘cheat’ on the LLM’s behalf by doing things manually when it’s bad at them.