Thane Ruthenis comments on Prize for Alignment Research Tasks

Thane Ruthenis 16 May 2022 5:21 UTC
1 point
Hmm. A speculative, currently-intractable way to do this might be to summarize the ML model before feeding it to the goal-extractor.
tl;dr: As per natural abstractions, most of the details of the interactions between the individual neurons are probably irrelevant with regards to the model’s high-level functioning/reasoning. So there should be, in principle, a way to automatically collapse e. g. a trillion-parameters model into a much lower-complexity high-level description that would still preserve such important information as the model’s training objective.
But there aren’t currently any fast-enough algorithms for generating such summaries.