Thank you. You make a good case for including this as evidence that capabilities are increasing. I suppose the question is whether they are increasing at the rate needed for short timelines. I think it’s worth asking whether the same-infrastructure performance showing zero improvement in four months is something that would have been expected four months ago. Of course, this is only one metric, over a short timeframe.
Yeah, I definitely think the improvements on osworld are much more impressive than the improvements on sweverified. I also think same infrastructure performance is a bit of a misleading in the sense that when we get super intelligence, I think it is very unlikely it will have the same infrastructure we use today. We should expect infrastructure changes to result in improvements I think!
Thank you. You make a good case for including this as evidence that capabilities are increasing. I suppose the question is whether they are increasing at the rate needed for short timelines. I think it’s worth asking whether the same-infrastructure performance showing zero improvement in four months is something that would have been expected four months ago. Of course, this is only one metric, over a short timeframe.
Yeah, I definitely think the improvements on osworld are much more impressive than the improvements on sweverified. I also think same infrastructure performance is a bit of a misleading in the sense that when we get super intelligence, I think it is very unlikely it will have the same infrastructure we use today. We should expect infrastructure changes to result in improvements I think!