Hey, I am Robert Kralisch, an independent conceptual/theoretical Alignment Researcher. I have a background in Cognitive Science and I am interested in collaborating on an end-to-end strategy for AGI alignment.
The three main branches that I aim to contribute to are conceptual clarity (what should we mean by agency, intelligence, embodiment, etc), the exploration of more inherently interpretable cognitive architectures, and Simulator theory.
One of my concrete goals is to figure out how to design a cognitively powerful agent such that it does not become a Superoptimiser in the limit.
Yeah, I wish we had some cleaner terminology for that.
Finetuning the “simulation engine” towards a particular task at hand (i.e. to find the best trade-off between breadth and depth search in strategy games, or even know how much “thinking time” or “error allowance” to allocate to a move), given limited cognitive resources, is something that I would associate with level 3 capability.
It certainly seems like learning could go into the direction of making the model of the game more useful by either improving the extent to which this model predicts/ouputs good moves or by improving the allocation of cognitive resources to the sub-tasks involved. Presumably, an intelligent system should be capable of testing which improvement vectors seem most fruitful (and the frequency with which to update this analysis), but I find myself a bit confused about whether that should count as level 3 or as level 4, since the system is reasoning about allocating resources across relevant learning processes.