Why is risk of human extinction hard to understand? Risk from a critical reactor or atmospheric ignition was readily seen by the involved researchers. Why not for AI? Maybe the reason is inscrutable stacks of matrixes instead of comparably simple physical equations which described the phenomena. Mechinterp does help because it provides a relation between the weights and understandable phenomena. But I wonder if we can reduce the models to a minimal reasoning model that doesn’t know much about the world or even language but learn only to reason with minimal primitives, e.g. lambda calculus as language and learning to learn in a minimal world (but still complex enough to benefit reasoning). Many game worlds have too much visual and not enough agency and negotiation. I wouldn’t be surprised if you could train a model with less than a million parameter to reason and scheme with much smaller data sets that are focused on this domain. The risk would arguably small because the model can’t deal with the real world. But it might prove many instrumental convergence results.
Why is risk of human extinction hard to understand? Risk from a critical reactor or atmospheric ignition was readily seen by the involved researchers. Why not for AI? Maybe the reason is inscrutable stacks of matrixes instead of comparably simple physical equations which described the phenomena. Mechinterp does help because it provides a relation between the weights and understandable phenomena. But I wonder if we can reduce the models to a minimal reasoning model that doesn’t know much about the world or even language but learn only to reason with minimal primitives, e.g. lambda calculus as language and learning to learn in a minimal world (but still complex enough to benefit reasoning). Many game worlds have too much visual and not enough agency and negotiation. I wouldn’t be surprised if you could train a model with less than a million parameter to reason and scheme with much smaller data sets that are focused on this domain. The risk would arguably small because the model can’t deal with the real world. But it might prove many instrumental convergence results.