So to be clear, I see by the TED-AI charts that you’re expecting a ~50% probability that by July 2029, top AI systems will be equal to or better than a human expert mechanical engineer at all cognitive tasks.
So does that mean you expect an AI system to be able to design a complex mechanical assembly with many moving parts and hundreds of components in CAD, that can be easily created and assembled, works as intended, and is overall just as good or better than what human expert mechanical engineers can design?
For an example, do you think by July 2029, there is a 50% chance that AI can design and program a robot that wins in a university-level robotics competition? (Assuming all the physical/non-cognitive tasks are completed perfectly, and with few redesign iterations allowed because even human experts need to iterate)
I’ve been running experiments like this on my own version of Vending bench. Basically they have 30 days to make as much money as possible buying and selling products in a vending machine.
I am trying a multi-agent environment using explorers which run experiments, observers which propose experiments, and a wiki maintainer which maintains a collective knowledge base which grows as experiments are completed. It’s quite expensive to run and depletes most of my codex weekly limit if I run it for a few hours, but it definitely does show that agents can collectively learn how to optimize arbitrary metrics. I’ll write a post about it sometime. I definitely agree that agents are under-elicited