As a former robotics developer, I feel the bitter lesson in my bones. This is actually one of the points I plan to focus on when I write up the longer version of my argument.
High-quality manual dexterity (and real-time visual processing) in a cluttered environment is a heartbreakingly hard problem, using any version of GOFAI techniques I knew at the time. And even the most basic of the viable algorithms quickly turned into a big steaming pile of linear algebra mixed with calculus.
As someone who has done robotics demos (and who knows all the things an engineer can do to make sure the demos go smoothly), the Figure AI groceries demostill blows my mind. This demo is well into the “6 impossible things before breakfast” territory for me, and I am sure as hell feeling the imminent AGI when I watch it. And I think this version of Figure was an 8B VLLM connected to an 80M specialized motor control model running at 200 Hz? Even if I assume that this is a very carefully run demo showing Figure under ideal circumstances, it’s still black magic fuckery for me.
But it’s really hard to communicate this intuitive reaction to someone who hasn’t spent years working on GOFAI robotics. Some things seem really easy until you actually start typing code into an editor and booting it on actual robot hardware, or until you start trying to train a model. And then these things reveal themselves as heartbreakingly difficult. And so when I see VLLM-based robots that just casually solve these problems, I remember years of watching frustrated PhDs struggle with things that seemed impossibly basic.
For me, “fix a leaky pipe under a real-world, 30-year-old sink without flooding the kitchen, and deal with all the weird things that inevitably go wrong” will be one of my final warning bells of imminent general intelligence. Especially if the same robot can also add a new breaker to the electrical panel and install a new socket in an older house.
As a former robotics developer, I feel the bitter lesson in my bones. This is actually one of the points I plan to focus on when I write up the longer version of my argument.
High-quality manual dexterity (and real-time visual processing) in a cluttered environment is a heartbreakingly hard problem, using any version of GOFAI techniques I knew at the time. And even the most basic of the viable algorithms quickly turned into a big steaming pile of linear algebra mixed with calculus.
As someone who has done robotics demos (and who knows all the things an engineer can do to make sure the demos go smoothly), the Figure AI groceries demo still blows my mind. This demo is well into the “6 impossible things before breakfast” territory for me, and I am sure as hell feeling the imminent AGI when I watch it. And I think this version of Figure was an 8B VLLM connected to an 80M specialized motor control model running at 200 Hz? Even if I assume that this is a very carefully run demo showing Figure under ideal circumstances, it’s still black magic fuckery for me.
But it’s really hard to communicate this intuitive reaction to someone who hasn’t spent years working on GOFAI robotics. Some things seem really easy until you actually start typing code into an editor and booting it on actual robot hardware, or until you start trying to train a model. And then these things reveal themselves as heartbreakingly difficult. And so when I see VLLM-based robots that just casually solve these problems, I remember years of watching frustrated PhDs struggle with things that seemed impossibly basic.
For me, “fix a leaky pipe under a real-world, 30-year-old sink without flooding the kitchen, and deal with all the weird things that inevitably go wrong” will be one of my final warning bells of imminent general intelligence. Especially if the same robot can also add a new breaker to the electrical panel and install a new socket in an older house.
The robots didn’t open the eggs box and individually put them in the rack inside the fridge, obviously crap, not buying the hype. /s