Another item for the todo list:
Compile neural networks from fountains of autogenerated programs.
Generate additional permutations by variously scrambling compiled neural networks.
Generate more “natural” neural representations by training networks to predict the mapping implied by the original code.
Train an interpreter to predict the original program from the neural network.
Naive implementation likely requires a fairly big CodeLlama-34b-Instruct-tier interpreter and can only operate on pretty limited programs, but it may produce something interesting. Trying to apply the resulting interpreter on circuits embedded in larger networks probably won’t work, but… worth trying just to see what it does?
Might also be something interesting to be learned in spanning the gap between ‘compiled’ networks and trained networks. How close do they come to being affine equivalents? If not linear, what kind of transform is required (and how complicated is it)?
I suppose that’s true in a very strict sense, but I wouldn’t expect people considering AI risk to have the level of uncertainty necessary for their decision to be predominately swayed by that kind of second order influence.
For example, someone can get pretty far with “dang, maybe GPT4 isn’t amazing at super duper deep reasoning, but it is great at knowing lots of things and helping synthesize information in areas that have incredibly broad complexity… And biology is such an area, and I dunno, it seems like GPT5 or GPT6 will, if unmitigated, have the kind of strength that lowers the bar on biorisk enough to be a problem. Or more of a problem.”
That’s already quite a few bits of information available by a combination of direct observation and one-step inferences. It doesn’t constrain them to “and thus, I must work on the fundamentals of agency,” but it seems like a sufficient justification for even relatively conservative governments to act.