Independent researcher. I hold an MSc in psychology. I work as an AI software engineer specialising in combinatorial optimisation, machine learning, graph search, natural language processing, and data compression.
I have been both researching and working on multi-objective value problems for almost 20 years, have followed discussions on AI Safety since 2006 and have participated more actively since about 2017.
My thesis topic was in cognitive psychology, about computational modelling of innate learning and planning mechanisms (classical and operant conditioning plus insight learning), which eventually form a foundation for culturally acquired language-based thought processes.
With co-authors I have published a full research paper in AAMAS (Autonomous Agents and Multi-Agent Systems), about utility functions for risk-averse multi-objective decision making, which are relevant for balancing the plurality of human values. For that purpose we introduced concave utility functions before multi-objective reward aggregation. See “Using soft maximin for risk averse multi-objective decision-making”. An interested reader may also want to take a look at my AISC V project proposal which inspired the aforementioned paper: https://bit.ly/aisc5-pluralistic-utility .
Later, with support from Foresight Institute’s grant in autumn 2023 - spring 2024 I created a suite of biologically and economically aligned multi-objective multi-agent long-running AI safety benchmark environments, based on the extended gridworlds platform I have been developing, published in Arxiv at the end of September 2024. This work includes OpenAI Baselines 3 based RL agents as well as LLM agents. See “From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks” and https://github.com/aintelope/biological-compatibility-benchmarks (contains agents training framework and a concrete implementation of a benchmark environment) and https://github.com/levitation-opensource/ai-safety-gridworlds (general underlying framework for multi-objective multi-agent environment building).
Additionally, together with collaborators, I have been testing alignment with the same biological and economical alignment principles on LLM-s using simpler map-free environments—in order to reduce any confounding factors and focus on essentials only. Turns out, in case of multi-objective setups, LLM agents don’t just lose context, but much worse—they can systematically flip into paperclip-maximiser like mode, which is more extreme than becoming incoherent. You can read more about the results of this work here: “Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format”.
I am a member of an AI ethics expert group. We have published Agentic AI guidelines documents volume 1 and 2 (published Sept 2024 - March 2025). See https://www.saferagenticai.org/assets/figures/Safer_Agentic_AI_Foundations_Vol2_I1_March2025.pdf . Over the years I have been a contributor to a few more governance related publications.
My resume: https://bit.ly/rp_ea_2018
Hello! Posted a new document on brainstorming for methodology of further research on the current runaway LLM-s findings:
Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs
I am hoping that this post will serve also as a generally interesting brainstorming collection and discussion ground of black-box LLM interpretability methodology as well as failure mitigation ideas.
Hope you find it relevant and interesting!