Pedro Afonso comments on Reflectively stable consequentialists are expected utility maximisers

Pedro Afonso 19 Apr 2026 11:24 UTC
10 points
0
Well, hello everyone, I hope you liked my post. I put quite a lot of effort into it. While reading other posts here you sometimes see people say something like how an advanced agent, if they don’t start as an expected utility maximiser, would realise that their behaviour is exploitable or otherwise deficient, and following the theorems change how they act (into being an EU maximiser) so that stops being the case. Now they can instead simply point here to show that something close to that is a beautiful theorem. I expect most people to disagree, but I think this is better than the VNM theorem and can serve as an alternative to it (which is why I called the definitions axioms).
There are many complaint you can have about it. Maybe you think that my definition of reflective stability is a bit contrived, or that some other decision theory is superior, or that the interpretation of probability is not completely clear, and that I should have derived it instead of taking it for granted, etc. This is not intended to be the be-all and end-all on the never-ending debate on the topic, but I think the assumptions here are good enough to have some bearing on the behaviour of real-life agents, and aren’t they nice assumptions to break?
Anyway, while it ended up being more interesting than I had expected, this post is an ad. And the main reason why I wrote it is to somehow get funding. For a long time I’ve been trying to get a job as a researcher, and at that I’ve had exactly zero success. I don’t know what “the reason” for that is, it’s not like funders bother to explain their rationale, but I can guess:
- For a field where basically nothing happens (except some results in mechanistic interpretability) there sure is a lot of competition
- I just haven’t applied to many funders and potential employers, and there just aren’t that many
- Maybe they think I’m incompetent
- Maybe they think my ideas just aren’t good enough
After talking to some people who are probably somewhat well informed about the funding situation, I arrived at the conclusion that there is basically no hope for me, and my only realistic option left was to become known, which sounds horrible, but if that’s what it takes to have a chance at success, I was willing to do it. My plan was to write a LW post under my actual name with some work on a topic that might interest funders who read the site, so I chose expected utility maximisation, because it’s a popular topic, working on it won’t advance capabilities, and it seemed plausible that I could produce some clean mathematical result. So here it is.
If you’re a potential funder or employer, I’m sorry to say that I don’t have much evidence of competence. It was never something I cared much about. I studied computer science (though they call it what in English would be “information engineering”) in Spain (next to that supercomputer in a chapel, though I never got to visit it) and then got a job (remotely, I currently live in the UK) at a software and consulting company, first as an intern, and then when the contract expired they offered to extend it by a year as an independent contractor. The new contract lasts until October 31st. I can leave early with a notice of 30 days, but in that case I plan to finish whatever projects I’m working on, to avoid causing them problems, which might take longer than that. I work as a programmer and consultant, with software that mostly does simulation (ODEs) and approximate Bayesian inference with time series data (in a very nice language called Julia), and I have learned a lot. I have never published any papers, but a few might be coming from the work I do there. The only other thing I have on LW under my name (because, if I remember correctly, it was required in order to get the money) that might be evidence I’m not incompetent are the results of a little online contest.
I don’t have much interest in working on LLMs or even deep learning. Those algorithms (the ones defined by the weights) are titanic pieces of spaghetti code, well beyond salvation. What they do is impressive, but they should never be the foundation of anything that matters. We should reverse-engineer what we can, and then throw them away and create proper software from scratch based on the insights we gain. I want to work on whatever theory is needed to create scalable robust artificial altruism, on designing AIs that aren’t unholy abominations, and possibly on mechanistic interpretability to support those two paths. I have little idea of how I’ll achieve that, the project will almost certainly fail, and if it does succeed it will take many years. No matter, it’s what must be done.
If you are interested, you can DM me or send an email to research♻️greatninja⚫mozmail⚫com (replacing the three emojis), and then we will probably take it to some private messaging app. If I don’t reply within at the very very most a week, something went wrong, so please try the other way, or leave a comment below the post.
And if this fails, and nobody is interested or it can’t work out for some reason, I might write another post, but you should know that’s an extremely inefficient way of getting work out of me.
(By the way, it seems that new users are rate limited to three comments per day, I’m not sure whether this applies to comments on my own posts, but if it does, I might have issues replying here)