RSS

Roland Pihlakas

Karma: 142

Independent researcher. I hold an MSc in psychology. I work as an AI software engineer specialising in combinatorial optimisation, machine learning, graph search, natural language processing, and data compression.

I have been both researching and working on multi-objective value problems for almost 20 years, have followed discussions on AI Safety since 2006 and have participated more actively since about 2017.

My thesis topic was in cognitive psychology, about computational modelling of innate learning and planning mechanisms (classical and operant conditioning plus insight learning), which eventually form a foundation for culturally acquired language-based thought processes.

With co-authors I have published a full research paper in AAMAS (Autonomous Agents and Multi-Agent Systems), about utility functions for risk-averse multi-objective decision making, which are relevant for balancing the plurality of human values. For that purpose we introduced concave utility functions before multi-objective reward aggregation. See “Using soft maximin for risk averse multi-objective decision-making”. An interested reader may also want to take a look at my AISC V project proposal which inspired the aforementioned paper: https://​​bit.ly/​​aisc5-pluralistic-utility .

Later, with support from Foresight Institute’s grant in autumn 2023 - spring 2024 I created a suite of biologically and economically aligned multi-objective multi-agent long-running AI safety benchmark environments, based on the extended gridworlds platform I have been developing, published in Arxiv at the end of September 2024. This work includes OpenAI Baselines 3 based RL agents as well as LLM agents. See “From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks” and https://​​github.com/​​aintelope/​​biological-compatibility-benchmarks (contains agents training framework and a concrete implementation of a benchmark environment) and https://​​github.com/​​levitation-opensource/​​ai-safety-gridworlds (general underlying framework for multi-objective multi-agent environment building).

Additionally, together with collaborators, I have been testing alignment with the same biological and economical alignment principles on LLM-s using simpler map-free environments—in order to reduce any confounding factors and focus on essentials only. Turns out, in case of multi-objective setups, LLM agents don’t just lose context, but much worse—they can systematically flip into paperclip-maximiser like mode, which is more extreme than becoming incoherent. You can read more about the results of this work here: “Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format”.

I am a member of an AI ethics expert group. We have published Agentic AI guidelines documents volume 1 and 2 (published Sept 2024 - March 2025). See https://​​www.saferagenticai.org/​​assets/​​figures/​​Safer_Agentic_AI_Foundations_Vol2_I1_March2025.pdf . Over the years I have been a contributor to a few more governance related publications.

My resume: https://​​bit.ly/​​rp_ea_2018

Black-box in­ter­pretabil­ity method­ol­ogy blueprint: Prob­ing run­away op­ti­mi­sa­tion in LLMs

Roland Pihlakas22 Jun 2025 18:16 UTC
17 points
0 comments7 min readLW link

Sys­tem­atic run­away-op­ti­miser-like LLM failure modes on Biolog­i­cally and Eco­nom­i­cally al­igned AI safety bench­marks for LLMs with sim­plified ob­ser­va­tion for­mat (BioBlue)

16 Mar 2025 23:23 UTC
45 points
8 comments11 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well). Subtleties and Open Challenges.

Roland Pihlakas12 Jan 2025 3:37 UTC
47 points
7 comments12 min readLW link

Build­ing AI safety bench­mark en­vi­ron­ments on themes of uni­ver­sal hu­man values

Roland Pihlakas3 Jan 2025 4:24 UTC
18 points
3 comments8 min readLW link
(docs.google.com)

Sets of ob­jec­tives for a multi-ob­jec­tive RL agent to optimize

23 Nov 2022 6:49 UTC
13 points
0 comments8 min readLW link