I’m also on Twitter, but I don’t have followers (except one that might or might not be human), so I don’t post anything interesting, so I don’t get followers
Pedro Afonso
It looks like it’s been over 22 hours since the post was approved. Maybe it just takes some time, but it has received 3 votes, summing to zero. I would appreciate an explanation from whoever downvoted it, or advice from anyone on how I could have done better. I think it’s a nice post, and if someone else had written it I would have upvoted them.
Well, hello everyone, I hope you liked my post. I put quite a lot of effort into it. While reading other posts here you sometimes see people say something like how an advanced agent, if they don’t start as an expected utility maximiser, would realise that their behaviour is exploitable or otherwise deficient, and following the theorems change how they act (into being an EU maximiser) so that stops being the case. Now they can instead simply point here to show that something close to that is a beautiful theorem. I expect most people to disagree, but I think this is better than the VNM theorem and can serve as an alternative to it (which is why I called the definitions axioms).
There are many complaint you can have about it. Maybe you think that my definition of reflective stability is a bit contrived, or that some other decision theory is superior, or that the interpretation of probability is not completely clear, and that I should have derived it instead of taking it for granted, etc. This is not intended to be the be-all and end-all on the never-ending debate on the topic, but I think the assumptions here are good enough to have some bearing on the behaviour of real-life agents, and aren’t they nice assumptions to break?
Anyway, while it ended up being more interesting than I had expected, this post is an ad. And the main reason why I wrote it is to somehow get funding. For a long time I’ve been trying to get a job as a researcher, and at that I’ve had exactly zero success. I don’t know what “the reason” for that is, it’s not like funders bother to explain their rationale, but I can guess:
For a field where basically nothing happens (except some results in mechanistic interpretability) there sure is a lot of competition
I just haven’t applied to many funders and potential employers, and there just aren’t that many
Maybe they think I’m incompetent
Maybe they think my ideas just aren’t good enough
After talking to some people who are probably somewhat well informed about the funding situation, I arrived at the conclusion that there is basically no hope for me, and my only realistic option left was to become known, which sounds horrible, but if that’s what it takes to have a chance at success, I was willing to do it. My plan was to write a LW post under my actual name with some work on a topic that might interest funders who read the site, so I chose expected utility maximisation, because it’s a popular topic, working on it won’t advance capabilities, and it seemed plausible that I could produce some clean mathematical result. So here it is.
If you’re a potential funder or employer, I’m sorry to say that I don’t have much evidence of competence. It was never something I cared much about. I studied computer science (though they call it what in English would be “information engineering”) in Spain (next to that supercomputer in a chapel, though I never got to visit it) and then got a job (remotely, I currently live in the UK) at a software and consulting company, first as an intern, and then when the contract expired they offered to extend it by a year as an independent contractor. The new contract lasts until October 31st. I can leave early with a notice of 30 days, but in that case I plan to finish whatever projects I’m working on, to avoid causing them problems, which might take longer than that. I work as a programmer and consultant, with software that mostly does simulation (ODEs) and approximate Bayesian inference with time series data (in a very nice language called Julia), and I have learned a lot. I have never published any papers, but a few might be coming from the work I do there. The only other thing I have on LW under my name (because, if I remember correctly, it was required in order to get the money) that might be evidence I’m not incompetent are the results of a little online contest.
I don’t have much interest in working on LLMs or even deep learning. Those algorithms (the ones defined by the weights) are titanic pieces of spaghetti code, well beyond salvation. What they do is impressive, but they should never be the foundation of anything that matters. We should reverse-engineer what we can, and then throw them away and create proper software from scratch based on the insights we gain. I want to work on whatever theory is needed to create scalable robust artificial altruism, on designing AIs that aren’t unholy abominations, and possibly on mechanistic interpretability to support those two paths. I have little idea of how I’ll achieve that, the project will almost certainly fail, and if it does succeed it will take many years. No matter, it’s what must be done.
If you are interested, you can DM me or send an email to research♻️greatninja⚫mozmail⚫com (replacing the three emojis), and then we will probably take it to some private messaging app. If I don’t reply within at the very very most a week, something went wrong, so please try the other way, or leave a comment below the post.
And if this fails, and nobody is interested or it can’t work out for some reason, I might write another post, but you should know that’s an extremely inefficient way of getting work out of me.
(By the way, it seems that new users are rate limited to three comments per day, I’m not sure whether this applies to comments on my own posts, but if it does, I might have issues replying here)
For the purposes of the theorem an outcome is just a meaningless element of a finite set over which we can set probability distributions. Whether or not the theorem applies to some actual physical agent does depend on how we define what an outcome is. Notice that the definition of reflective stability requires an agent to behave a certain way in all succession setups, and therefore we must consider all scenarios. So, if “I climb the nearby mountain” is an action, and “I don’t climb the nearby mountain and the weather is sunny” is an outcome, it must be possible to create a scenario where climbing the nearby mountain has an 100% chance of resulting in not climbing the nearby mountain, and the weather being sunny. You must choose some partition of trajectories of the world into outcomes such that it is in principle possible to create any scenario, and if your agent is a reflectively stable consequentialist with respect to that partition, the theorem says that it will also be an expected utility maximiser with respect to it. Partitioning the trajectories is actually forced by the fact that reality is continuous but the theorem only works with a finite set of outcomes.
Consequentialism is also not as clearly false about people as you make it out to be (although it is false). “I climb the nearby mountain” is clearly not an action that can be taken in an arbitrary situation, whereas we are assuming that the set of available actions is exactly the same in all scenarios. What is always available is some discretization of the set of possible signals we can send to our muscles at some instant. In the example I gave of a corporation, the first action is not “choose the first possible successor”, it is just action 1, which in that scenario results in choosing a successor, but in some other scenario can result in choosing a different successor, or maybe walking two steps right. You can choose different levels of granularity depending on the kind of thing you are trying to describe. In the case of climbing a mountain, I think that considering it as an atomic action, or even as an action at all is not the right way of seeing things. The vast, vast majority of all the possible actual action sequences someone can take when in front of a mountain simply result in them falling to the ground. Achieving the goal of climbing a mountain is not trivial, and it requires different actions depending on what happens to be in front of them, so it should be considered a consequence instead.
You can of course insist that you care directly about whatever you define an action to be, and that therefore it must be considered as a part of the outcomes too if we want to be viewed as consequentialists. I think that’s probably not unreasonable, but it does break the theorem. Purely behavioral theorems are insufficient to describe human values. Reasoning clearly about how someone can both want to climb a mountain and reach the top will require thinking about the mechanisms in common between both kinds of goals, which will break the setup in many other ways.
I hope this made sense.