Right. The use case I had in mind for electric cars was the standard “You see someone walking by the edge of the street; are they going to step out into the street or not? It depends on e.g. which way they are facing, whether they just dropped something into the street, … etc.” That seems like something where pixel-based image prediction would be superior to e.g. classifying the entity as a pedestrian and then adding a pedestrian token to your 3D model of your enviornment.
Right. The use case I had in mind for electric cars was the standard “You see someone walking by the edge of the street; are they going to step out into the street or not? It depends on e.g. which way they are facing, whether they just dropped something into the street, … etc.” That seems like something where pixel-based image prediction would be superior to e.g. classifying the entity as a pedestrian and then adding a pedestrian token to your 3D model of your enviornment.