Thank you for pointing at the cost graph. It is the ratio of the cost of a SUCCESSFUL run to the cost of hiring a human. But what if we take the failed runs into account as well? I wonder if the total cost of failed and successful runs is 10 times bigger for 8-hour-length tasks, placing far more tasks above the 100 threshold.
UPD: the o3-mini is about 30 times cheaper than o1, not 7. Then the cost of o3-low might be increased just 16 times, yielding the AGI costing $3200 per use (for what long tasks, exactly?)
UPD2: I messed up the count. The model o3-mini (low) costs $0.040, while o3(low) costs $200, meaning a 5000 times increase, not a 500 times.
Thank you for pointing at the cost graph. It is the ratio of the cost of a SUCCESSFUL run to the cost of hiring a human. But what if we take the failed runs into account as well? I wonder if the total cost of failed and successful runs is 10 times bigger for 8-hour-length tasks, placing far more tasks above the 100 threshold.
UPD: the o3-mini is about 30 times cheaper than o1, not 7. Then the cost of o3-low might be increased just 16 times, yielding the AGI costing $3200 per use (for what long tasks, exactly?)
UPD2: I messed up the count. The model o3-mini (low) costs $0.040, while o3(low) costs $200, meaning a 5000 times increase, not a 500 times.