I agree with some light suspicion, but I would be more warry of automatically rejecting things that dont match up with the othe proxies. I feel like (very roughly) the other three mostly give lower bounds, while (a certain type of abstract model) mostly gives upper bounds. When our best upper bounds and our best lower bounds dont match, the best response looks like large error bars.
I think that upper and lower bounds on capabilities are not the main thing we should be looking for, but I think we can also get pretty big lower bounds from starting with a human and imagining what happens when we can scale it up (more copies, more time, more introspection).
I agree with some light suspicion, but I would be more warry of automatically rejecting things that dont match up with the othe proxies. I feel like (very roughly) the other three mostly give lower bounds, while (a certain type of abstract model) mostly gives upper bounds. When our best upper bounds and our best lower bounds dont match, the best response looks like large error bars.
I think that upper and lower bounds on capabilities are not the main thing we should be looking for, but I think we can also get pretty big lower bounds from starting with a human and imagining what happens when we can scale it up (more copies, more time, more introspection).