One could hook up a language model to decide what to visualize, Sora to generate visualizations, and a vision model to extract outcomes. This seems like around 40% of what intelligence is—the only thing I don’t really see is how reward should be “plugged in,” but there may be naive ways to set goals.
One could hook up a language model to decide what to visualize, Sora to generate visualizations, and a vision model to extract outcomes.
This seems like around 40% of what intelligence is—the only thing I don’t really see is how reward should be “plugged in,” but there may be naive ways to set goals.