There are surely reasons to do ambitious interp that are not the stated goal of ambitious interp? I doubt we will have a fully understandable model by 2028, but I still think the abstractions developed in the process will be helpful.
For instance, many of the higher-order methods like SAEs are based on assumptions about how activation space is structured. Studying smaller systems rigorously can give us the ground truth for how models construct their activation space, that can allow us to question/modify said assumptions.
Unfortunately, prediction markets need some bright red line somewhere to be resolvable. I encourage you to make a different market that captures the thing you care about.
But people with the belief that we aren’t going to be able to fully understand models frequently take this as a reason not to pursue ambitious/rigorous interpretability. I thought that was the position you were taking, by using the market to decide whether the agenda is “good” or not.
There are surely reasons to do ambitious interp that are not the stated goal of ambitious interp? I doubt we will have a fully understandable model by 2028, but I still think the abstractions developed in the process will be helpful.
For instance, many of the higher-order methods like SAEs are based on assumptions about how activation space is structured. Studying smaller systems rigorously can give us the ground truth for how models construct their activation space, that can allow us to question/modify said assumptions.
Unfortunately, prediction markets need some bright red line somewhere to be resolvable. I encourage you to make a different market that captures the thing you care about.
I don’t care about prediction markets.
But people with the belief that we aren’t going to be able to fully understand models frequently take this as a reason not to pursue ambitious/rigorous interpretability. I thought that was the position you were taking, by using the market to decide whether the agenda is “good” or not.