An independent researcher/blogger/philosopher about intelligence and agency (esp. Active Inference), alignment, ethics, interaction of the AI transition with the sociotechnical risks (epistemics, economics, human psychology), collective mind architecture, research strategy and methodology.
Twitter: https://twitter.com/leventov. E-mail: leventov.ru@gmail.com (the preferred mode of communication). I’m open to collaborations and work.
Presentations at meetups, workshops and conferences, some recorded videos.
I’m a founding member of the Gaia Consoritum, on a mission to create a global, decentralised system for collective sense-making and decision-making, i.e., civilisational intelligence. Drop me a line if you want to learn more about it and/or join the consoritum.
You can help to boost my sense of accountability and give me a feeling that my work is valued by becoming a paid subscriber of my Substack (though I don’t post anything paywalled; in fact, on this blog, I just syndicate my LessWrong writing).
For Russian speakers: русскоязычная сеть по безопасности ИИ, Telegram group.
The term “RL agent” means an agent with architecture from a certain class, amenable to a specific kind of training. Since you are discussing RL agents in this post, I think it could be misleading to use human examples and analogies (“travelling across the world to do cocaine”) in it because humans are not RL agents, neither on the level of wetware biological architecture (i. e., neurons and synapses don’t represent a policy) nor on the abstract, cognitive level. On the cognitive level, even RL-by-construction agents of sufficient intelligence, trained in sufficiently complex and rich environments, will probably exhibit the dynamic of Active Inference agents, as I note below.
It’s not completely clear to me what you mean by “selection for agents” and “selection for reward”—RL training or evolutionary hyperparameter tweaking in the agent’s architecture which itself is guided by the agent’s score (i. e., the reward) within a larger process of “finding an agent that does the task the best”. The latter process can and probably will select for “reward optimizers”.
I think that Active Inference is a simpler representation of the same ideas which doesn’t use the concepts of attractors, reward, reinforcement, antecedent computation, utility, and so on. Instead of explicitly representing utilities, Active Inference agents only have (stronger or weaker) beliefs about the world, including beliefs about themselves (“the kind of agent/creature/person I am”), and fulfil these beliefs through actions (self-evidencing). In humans, “rewarding” neurotransmitters regulate learning and belief updates.
The question which is really interesting to me is how inevitable it is that the Active Inference dynamic emerges as a result of training RL agents to certain levels of capability/intelligence.
The longer I look at this statement (and its shorter version “Reward is not the optimization target”), the less I understand what it’s supposed to mean, considering that “optimisation” might refer to the agent’s training process as well as the “test” process (even if they overlap or coincide). It looks to me that your idea can be stated more concretely as “the more intelligent/capable RL agents (either model-based or model-free) become in the process of training using the currently conventional training algorithms, the less they will be susceptible to wireheading, rather than actively seek it”?
The first part of this statement is about RL agents, the second is about humans. I think the second part doesn’t make a lot of sense. Humans should not be analysed as RL agents in the first place because they are not RL agents, as stated above.
Unfortunately, it’s far from obvious to me that Active Inference agents (which sufficiently intelligent RL agents will apparently become by default) are corrigible even in principle. As I noted in the post, such an agent can discover the Free Energy Principle (or read about it in the literature), form a belief that it is an Active Inference agent, and then disregard anything that humans will try to impose on it because it will contradict the belief that it is an Active Inference agent.