I think given more time we could probably come up with tools that would help when we resume capabilities research. For example, there’s probably ways to do something like the logit lens but that work better, or ways to automatically factor models into more interpretable pieces, or just the long slog of tracing through circuits to figure out what the model is doing and build one from scratch rather than training. I don’t know how practical any of these approaches are, but I don’t think we’re at the limit of what we can learn from current models.
I think given more time we could probably come up with tools that would help when we resume capabilities research. For example, there’s probably ways to do something like the logit lens but that work better, or ways to automatically factor models into more interpretable pieces, or just the long slog of tracing through circuits to figure out what the model is doing and build one from scratch rather than training. I don’t know how practical any of these approaches are, but I don’t think we’re at the limit of what we can learn from current models.