Hey, I am Robert Kralisch, an independent conceptual/theoretical Alignment Researcher. I have a background in Cognitive Science and I am interested in collaborating on an end-to-end strategy for AGI alignment.
The three main branches that I aim to contribute to are conceptual clarity (what should we mean by agency, intelligence, embodiment, etc), the exploration of more inherently interpretable cognitive architectures, and Simulator theory.
One of my concrete goals is to figure out how to design a cognitively powerful agent such that it does not become a Superoptimiser in the limit.
Thanks a lot for the encouragement :)
Yes, I am trying to understand a generalized (which also means simplified) and formalizable parallel to human cognition. Some of my thinking on this is inspired by predictive coding and adaptive resonance theory (although prettly loosely by the latter), and I am trying to figure out the implications of our most updated understanding of neurobiological principles, together with a notion of the “riverbeds of cognition”.
In other words, how can we design an architecture such that it is not pressured to take shortcuts or “work around” design decisions we made, as its cognition develops? Is there a “natural path” of cognitive development that avoids some of the common pitfalls and failure modes (i.e. can we aim inner alignment if we have proficiency in this area)?
This has a direct bearing on interpretability, and goes together with the goal of a sort of “conceptual curriculum” that is intended to teach the system natural abstractions.
If I remember correctly, the centrality of “constraint satisfaction” fell out of considering causal (hyper/meta)graphs as sensible representational substrate (which was partially inspired by Ben Goertzel). I personally find it quite intuitive to think in graphs.