Which, importantly, includes every fruit of our science and technology.
I don’t think this is the right comparison, since modern science / technology is a collective effort and so can only cumulate thinking through mostly-interpretable steps. (This may also be true for AI, but if so then you get interpretability by default, at least interpretability-to-the-AIs, at which point you are very likely better off trying to build AIs that can explain that to humans.)
In contrast, I’d expect individual steps of scientific progress that happen within a single mind often are very uninterpretable (see e.g. “research taste”).
If we understood an external superhuman world-model as well as a human understands their own world-model, I think that’d obviously get us access to tons of novel knowledge.
Sure, I agree with that, but “getting access to tons of novel knowledge” is nowhere close to “can compete with the current paradigm of building AI”, which seems like the appropriate bar given you are trying to “produce a different tool powerful enough to get us out of the current mess”.
Perhaps concretely I’d wildly guess with huge uncertainty that this would involve an alignment tax of ~4 GPTs, in the sense that if you had an interpretable world model from GPT-10 similar in quality to a human’s understanding of their own world model, that would be similarly useful as GPT-6.
I don’t think this is the right comparison, since modern science / technology is a collective effort and so can only cumulate thinking through mostly-interpretable steps. (This may also be true for AI, but if so then you get interpretability by default, at least interpretability-to-the-AIs, at which point you are very likely better off trying to build AIs that can explain that to humans.)
In contrast, I’d expect individual steps of scientific progress that happen within a single mind often are very uninterpretable (see e.g. “research taste”).
Sure, I agree with that, but “getting access to tons of novel knowledge” is nowhere close to “can compete with the current paradigm of building AI”, which seems like the appropriate bar given you are trying to “produce a different tool powerful enough to get us out of the current mess”.
Perhaps concretely I’d wildly guess with huge uncertainty that this would involve an alignment tax of ~4 GPTs, in the sense that if you had an interpretable world model from GPT-10 similar in quality to a human’s understanding of their own world model, that would be similarly useful as GPT-6.