I thought the section on interpretability as a tool to predict future systems was poor. The posts arguments against that theory of impact are: reading current papers is a better predictor of future capabilities than current interpretability work & examples of interpretability being applied after phenomenon are discovered. But no one is saying current interpretability tech & insights will let you predict the future! As you point out, we barely even understand what a feature is!
Which could change. If we advance enough to reverse engineer GPT-4, and future systems, that would be a massive increase in our understanding of intelligence. If we knew how GPT-4 ticks, we could say how far it could continue improving, and how fast. We would plausibly make huge strides in agent foundations if we knew how to design a mind at all.
Now there’s an obvious reason not to pursue this goal: it is dangerous if it works out. And so hard to achieve we’d likely need crazy amounts of co-ordination to stop all the researchers involved from spilling the beans. Imagine having the theoretical insights to build GPT-4 by hand going around the block. You could, I don’t know, do something like Cyc but actually useful. You’d have a rando building an opensource AGI project in a week, with people feeding in little bits of domain knowledge by training modular QNRs. Or maybe you’d get some freaking nerd coding a seed AI and pressing run.
I thought the section on interpretability as a tool to predict future systems was poor. The posts arguments against that theory of impact are: reading current papers is a better predictor of future capabilities than current interpretability work & examples of interpretability being applied after phenomenon are discovered. But no one is saying current interpretability tech & insights will let you predict the future! As you point out, we barely even understand what a feature is!
Which could change. If we advance enough to reverse engineer GPT-4, and future systems, that would be a massive increase in our understanding of intelligence. If we knew how GPT-4 ticks, we could say how far it could continue improving, and how fast. We would plausibly make huge strides in agent foundations if we knew how to design a mind at all.
Now there’s an obvious reason not to pursue this goal: it is dangerous if it works out. And so hard to achieve we’d likely need crazy amounts of co-ordination to stop all the researchers involved from spilling the beans. Imagine having the theoretical insights to build GPT-4 by hand going around the block. You could, I don’t know, do something like Cyc but actually useful. You’d have a rando building an opensource AGI project in a week, with people feeding in little bits of domain knowledge by training modular QNRs. Or maybe you’d get some freaking nerd coding a seed AI and pressing run.
EDIT: Also, this is a good post. Have much karma.