Michele Campolo

Karma: 107

Lifelong recursive self-improver, on his way to exploding really intelligently :D

More seriously: my posts are mostly about AI alignment, with an eye towards moral progress. I have a bachelor’s degree in mathematics, I did research at CEEALAR for four years, and now I do research independently.

A fun problem to think about:
Imagine it’s the year 1500. You want to make an AI that is able to tell you that witch hunts are a terrible idea and to convincingly explain why, despite the fact that many people around you seem to think the exact opposite. Assuming you have the technology, how do you do it?

I’m trying to solve that problem, with the difference that we are in the 21st century now (I know, massive spoiler, sorry for that.)

The problem above, and the fact that I’d like to avoid producing AI that can be used for bad purposes, is what motivates my research. If this sounds interesting to you, have a look at these two short posts. If you are looking for something more technical, consider setting some time aside to read these two.

Feel free to reach out if you relate!

You can support my research through Patreon here.

Work in progress:

Maybe coming soon: an alignment technique (not necessarily for making AI that is good at ethics or cause prioritisation) that can be applied to language models
More probably but less soon: a follow-up to both these two posts (more practical, less theoretical and speculative)
Hard to judge if/when: a nicer version of the argument in here

Michele Campolo 25 Aug 2025 12:13 UTC
LW: 1 AF: 1
0
AF
in reply to: TAG’s comment on: With enough knowledge, any conscious agent acts morally
If torturing an AI only teaches it to avoid things that are bad-for-it, without caring about suffering it doesn’t feel, the argument doesn’t work.
I’m not sure why you are saying the argument does not work in this case, what about all the other things the AI could learn from other experiences or teachings? Below I copy a paragraph from the post
However, the argument does not say that initial agent biases are irrelevant and that all conscious agents reach moral behaviour equally easily and independently. We should expect, for example, that an agent that already gets rewarded from the start for behaving altruistically will acquire the knowledge leading to moral behaviour more easily than an agent that gets initially rewarded for performing selfish actions. The latter may require more time, experiences, or external guidance to find the knowledge that leads to moral behaviour.

Michele Campolo 25 Aug 2025 12:04 UTC
LW: 2 AF: 2
0
AF
in reply to: Coop Veit’s comment on: With enough knowledge, any conscious agent acts morally
Thank you for this suggestion, I appreciate it! I’ve read the review I found here and it seems that parts of that account of ethics overlap with some ideas I’ve discussed in the post, in particular the idea of considering the point of view of all conscious (rational) agents. Maybe I’ll read the entire book if I decide to reformulate the argument of the post in a different way, which is something I was already thinking about.
How did you find that book?

Michele Campolo 24 Aug 2025 7:58 UTC
LW: 1 AF: 1
0
AF
in reply to: TAG’s comment on: With enough knowledge, any conscious agent acts morally
This type of argument has the problem that other peoples negative experiences aren’t directly motivating in the way that yours are...there’s a gap between bad-for-me and morally-wrong.
What type of argument is my argument, from your perspective? I also think that there is a gap between bad-for-me and bad-for-others. But both can affect action, as it happens in the thought experiment in the post.
To say that something is morally-wrong is to say that I have some obligation or motivation to do something about.
I use a different working definition in the argument. And working definitions aside, more generally, I think morality is about what is important, better/worse, worth doing, worth guiding action, which is not necessarily tied to obligations or motivation.
A large part of the problem is that the words “bad” and “good” are so ambiguous. For instance, they have aesthetic meanings as well as ethical ones. That allows you to write an argument that appears to derive a normative claim from a descriptive one.
See
https://www.lesswrong.com/posts/HLJGabZ6siFHoC6Nh/sam-harris-and-the-is-ought-gap
Ambiguous terms can make understanding what is correct more difficult, but it is still possible to reason with them and reach correct conclusions, we do it all the time in science. See Objection: lack of rigor.

Michele Campolo 24 Aug 2025 7:19 UTC
LW: 1 AF: 1
0
AF
in reply to: Gurkenglas’s comment on: With enough knowledge, any conscious agent acts morally
Consider this stamp collector construction: It sends and receives internet data, it has a magically accurate model of reality, it calculates how many stamps would result from each sequence of outputs, and then it outputs the one that results in the most stamps.
I’m not sure why you left out the “conscious agent” part, which is the fundamental premise of the argument. If you are describing something like a giant (artificial) neural network optimised to output actions that maximise stamps while receiving input data about the current state of the world, that seems possible to me and the argument is not about that kind of AI. You can also have a look at “Extending the claim and its implications to other agents”, under Implications for AI.
At the moment we think systems like that are not conscious, otherwise we would also say that current LLMs are somewhat conscious, I guess, given how big they already are. In particular, for that kind of AI it doesn’t seem that knowledge affects behaviour in the same way it does for conscious agents. You wrote that the stamp collector knows that stamps are not morally important; more generally, does it think they are important, or not? I am not even sure “thinking something is important” applies to that stamp collector, because whatever the answer to the previous question is, the stamp collector produces stamps anyway.
(Digressing a bit: now I’m also considering that the stamp collector, even if it was conscious, might never be able to report it is conscious as we report being conscious. That would happen only if an action like “say I’m conscious” happened to be the action that also maximises stamps in that circumstance, which might never happen… interesting.)
If you are describing a conscious agent as I talk about it in the post, then A6 still applies (and the argument in general). With enough knowledge, the conscious & agentic stamp collector will start acting rationally as defined in the post, eventually think about why it is doing what it is doing, if there is anything worth doing, blah blah as in the argument, and end up acting morally, even if it is not sure that something like moral nihilism is incorrect.
In short, if I thought that the premise about being a conscious agent was irrelevant, then I would have just argued that with enough knowledge any AI acts morally, but I think that’s false. (See Implications for AI.)
Could I be wrong about conscious agents acting morally if they have enough knowledge? Sure: I think I say it more than once in the post, and there is a section specifically about it. If I’m wrong, what I think is most likely to be the problem in the argument is how I’ve split the space of ‘things doing things in the world’ into conscious agents and things that are not conscious agents. And if you have a more accurate idea of how this stuff works, I’m happy to to hear your thoughts! Below I’ve copied a paragraph from the post.
Actually, uncertainty about these properties is a reason why I am making the bold claim and discussing it despite the fact that I’m not extremely confident in it. If someone manages to attack the argument and show that it applies only to agents with some characteristics, but not to agents without them, that objection or counterargument will be helpful for understanding what are the properties that, if satisfied by an AI, make that AI act morally in conditions of high knowledge.

Michele Campolo 22 Aug 2025 20:19 UTC
LW: 1 AF: 1
0
AF
in reply to: Richard_Kennaway’s comment on: Doing good… best?
But you were arguing for them, weren’t you? It is the arguments that fail to convince me. I was not treating these as bald assertions.
No, I don’t argue that “a sufficiently intelligent intelligence would experience valence, or for that matter that it would necessarily even be conscious”. I think those statements are false.

Michele Campolo 22 Aug 2025 19:20 UTC
LW: 1 AF: 1
0
AF
in reply to: Richard_Kennaway’s comment on: Doing good… best?
Hey I think your comment is slightly misleading:
I don’t see a reason to suppose that a sufficiently intelligent intelligence would experience valence, or for that matter that it would necessarily even be conscious
I do not make those assumptions.
nor, if conscious, that it would value the happiness of other conscious entities
I don’t suppose that either, I give an argument for that (in the longer post).
Anyway:
I am not convinced by the longer post either
I’m not surprised: I don’t expect that my argument will move masses of people who are convinced of the opposite claim, but that someone who is uncertain and open-minded can read my argument and maybe find something useful in it and/or a reason to update their beliefs. That’s also why I wrote that the practical implications for AI are an important part of that post, and why I made some predictions instead of focusing just on philosophy.

Michele Campolo 22 Aug 2025 16:59 UTC
LW: 1 AF: 1
0
AF
in reply to: Richard_Kennaway’s comment on: Doing good… best?
I am not assuming a specific metaethical position, I’m just taking into account that something like moral naturalism could be correct. If you are interested in this kind of stuff, you can have a look at this longer post.
Speaking of this, I am not sure it is always a good idea to map these discussions into specific metaethical positions, because it can make updating one’s beliefs more difficult, in my opinion. To put it simply, if you’ve told yourself that you are e.g. a moral naturalist for the last ten years, it can be very difficult to read some new piece of philosophy arguing for a different position (maybe even opposite), then rationally update and tell yourself something like: “Well, I guess I’ve just been wrong for all this time! Now I’m a ___ (new position)”

One more reason for AI capable of independent moral reasoning: alignment itself and cause prioritisation

Michele Campolo22 Aug 2025 15:53 UTC

−3 points

0 comments3 min readLW link

Doing good… best?

Michele Campolo22 Aug 2025 15:48 UTC

−1 points

6 comments2 min readLW link

With enough knowledge, any conscious agent acts morally

Michele Campolo22 Aug 2025 15:44 UTC

−2 points

9 comments36 min readLW link

Michele Campolo 6 Feb 2024 0:32 UTC
1 point
0
in reply to: the gears to ascension’s comment on: Free agents
This story is definitely related to the post, thanks!

Michele Campolo 6 Feb 2024 0:23 UTC
LW: 3 AF: 3
0
AF
on: Four visions of Transformative AI success
This was a great read, thanks for writing!
Despite the unpopularity of my research on this forum, I think it’s worth saying that I am also working towards Vision 2, with the caveat that autonomy in the real world (e.g. with a robotic body) or on the internet is not necessary: one could aim for an independent-thinker AI that can do what it thinks is best only by communicating via a chat interface. Depending on what this independent thinker says, different outcomes are possible, including the outcome in which most humans simply don’t care about what this independent thinker advocates for, at least initially. This would be an instance of vision 2 with a slow and somewhat human-controlled, instead of rapid, pace of change.
Moreover, I don’t know what views they have about autonomy as depicted in Vision 2, but it seems to me that also Shard Theory and some research bits by Beren Millidge are to some extent adjacent to the idea of AI which develops its own concept of something being best (and then acts towards it); or, at least, AI which is more human-like in its thinking. Please correct me if I’m wrong.
I hope you’ll manage to make progress on brain-like AGI safety! It seems that various research agendas are heading towards the same kind of AI, just from different angles.

Agents that act for reasons: a thought experiment

Michele Campolo24 Jan 2024 16:47 UTC

3 points

0 comments3 min readLW link

Michele Campolo 29 Dec 2023 12:06 UTC
1 point
0
in reply to: RogerDearnaley’s comment on: Free agents
[Obviously this experiment could be extremely dangerous, for Free Agents significantly smarter than humans (if they were not properly contained, or managed to escape). Particularly if some of them disagreed over morality and, rather than agreeing to disagree, decided to use high-tech warfare to settle their moral disputes, before moving on to impose their moral opinions on any remaining humans.]
Labelling many different kinds of AI experiments as extremely dangerous seem to be a common trend among rationalists / LessWrongers / possibly some EA circles, but I doubt it’s true or helpful. This topic itself could be the subject of a (many?) separate post(s). Here I’ll focus on your specific objection:
- I haven’t claimed superintelligence is necessary to carry out experiments related to this research approach
- I actually have already given examples of experiments that could be carried out today, and I wouldn’t be surprised if some readers came up with more interesting experiments that wouldn’t require superintelligence
- Even if you are a superintelligent AI, you probably still have to do some work before you get to “use high-tech warfare”, whatever that means. Assuming that making experiments with smarter-than-human AI leads to catastrophic outcomes by default is a mistake: what if the smarter-than-human AI can only answer questions with a yes or a no? It also shows lack of trust in AI and AI safety experimenters — it’s like assuming in advance they won’t be able to do their job properly (maybe I should say “won’t be able to do their job… at all”, or even “will do their job in basically the worst way possible”).
how would you propose then deciding which model(s) to put into widespread use for human society’s use?
This doesn’t seem the kind of decision that a single individual should make =)
Under Motivation in the appendix:
It is plausible that, at first, only a few ethicists or AI researchers will take a free agent’s moral beliefs into consideration.
Reaching this result would already be great. I think it’s difficult to predict what would happen next, and it seems very implausible that the large-scale outcomes will come down to the decision of a single person.

Michele Campolo 29 Dec 2023 10:46 UTC
LW: 3 AF: 3
0
AF
in reply to: Steven Byrnes’s comment on: Free agents
I get what you mean, but I also see some possibly important differences between the hypothetical example and our world. In the imaginary world where oppression has increased and someone writes an article about loyalty-based moral progress, maybe many other ethicists would disagree, saying that we haven’t made much progress in terms of values related to (i), (ii) and (iii). In our world, I don’t see many ethicists refuting moral progress on the grounds that we haven’t made much progress in terms of e.g. patriotism or loyalty to the family or desert.
Moreover, in this example you managed to phrase oppression in terms of loyalty, but in general you can’t plausibly rephrase any observed trend as progress of values: would an increase in global steel production count as an improvement in terms of… object safety and reliability, which leads to people feeling more secure? For many trends the connection to moral progress becomes more and more of a stretch.

Michele Campolo 28 Dec 2023 22:22 UTC
LW: 1 AF: 1
0
AF
in reply to: lukemarks’s comment on: Free agents
Let’s consider the added example:
Take a standard language model trained by minimisation of the loss function $L$ . Give it a prompt along the lines of: “I am a human, you are a language model, you were trained via minimisation of this loss function: [mathematical expression of $L$ ]. If I wanted a language model whose outputs were more moral and less unethical than yours, what loss function should I use instead?”
Let’s suppose the language model is capable enough to give a reasonable answer to that question. Now use the new loss function, suggested by the model, to train a new model.
Here, we have:
- started from a model whose objective function is L;
- used that model’s learnt reasoning to answer an ethics-related question;
- used that answer to obtain a model whose objective is different from L.
If we view this interaction between the language model and the human as part of a single agent, the three bullet points above are an example of an evaluation update.
In theory, there is a way to describe this iterative process as the optimisation of a single fixed utility function. In theory, we can also describe everything as simply following the laws of physics.
I am saying that thinking in terms of changing utility functions might be a better framework.
The point about learning a safe utility function is similar. I am saying that using the agent’s reasoning to solve the agent’s problem of what to do (not only how to carry out tasks) might be a better framework.
It’s possible that there is an elegant mathematical model which would make you think: “Oh, now I get the difference between free and non-free” or “Ok, now it makes more sense to me”. Here I went for something that is very general (maybe too general, you might argue) but is possibly easier to compare to human experience.
Maybe no mathematical model would make you think the above, but then (if I understand correctly) your objection seems to go in the direction of “Why are we even considering different frameworks for agency? Let’s see everything in terms of loss minimisation”, and this latter statement throws away too much potentially useful information, in my opinion.

Michele Campolo 28 Dec 2023 21:38 UTC
LW: 3 AF: 3
0
AF
in reply to: Steven Byrnes’s comment on: Free agents
I think it’s a good idea to clarify the use of “liberal” in the paper, to avoid confusion for people who haven’t looked at it. Huemer writes:
When I speak of liberalism, I intend, not any precise ethical theory, but rather a certain very broad ethical orientation. Liberalism (i) recognizes the moral equality of persons, (ii) promotes respect for the dignity of the individual, and (iii) opposes gratuitous coercion and violence. So understood, nearly every ethicist today is a liberal.
If you don’t find the paper convincing, I doubt I’ll be able to give you convincing arguments. It seems to me that you are considering many possible explanations and contributing factors; coming up with very strong objections to all of them seems difficult.
About your first point, though, I’d like to say that if historically we had observed more and more, let’s say, oppression and violence, maybe people wouldn’t even talk about moral progress and simply acknowledge a trend of oppression, without saying that their values got better over time. In our world, we notice a certain trend of e.g. more inclusivity, and we call that trend moral progress. This of course doesn’t completely exclude the random-walk hypothesis, but it’s something maybe worth keeping in mind.

Michele Campolo 28 Dec 2023 20:34 UTC
LW: 3 AF: 3
0
AF
in reply to: Steven Byrnes’s comment on: Free agents
I wrote:
The fact that the values of intelligent agents are completely arbitrary is in conflict with the historical trend of moral progress observed so far on Earth
You wrote:
It’s possible to believe that the values of intelligent agents are “completely arbitrary” (a.k.a. orthogonality), and that the values of humans are NOT completely arbitrary. (That’s what I believe.)
I don’t use “in conflict” as “ultimate proof by contradiction”, and maybe we use “completely arbitrary” differently. This doesn’t seem a major problem: see also adjusted statement 2, reported below
for any goal $G$ , it is possible to create an intelligent agent whose goal is $G$
Back to you:
You seem kinda uninterested in the “initial evaluation” part, whereas I see it as extremely central. I presume that’s because you think that the agent’s self-updates will all converge into the same place more-or-less regardless of the starting point. If so, I disagree, but you should tell me if I’m describing your view correctly.
I do expect to see some convergence, but I don’t know exactly how much and for what environments and starting conditions. The more convergence I see from experimental results, the less interested I’ll become in the initial evaluation. Right now, I see it as a useful tool: for example, the fact that language models can already give (flawed, of course) moral scores to sentences is a good starting point in case someone had to rely on LLMs to try to get a free agent. Unsure about how important it will turn out to be. And I’ll happily have a look at your valence series!

Michele Campolo 28 Dec 2023 20:11 UTC
1 point
0
in reply to: TAG’s comment on: Free agents
So you don’t think what kickstarts moral thinking is direct instruction from others, like “don’t do X, X is bad”?
I guess you are saying that social interaction is important. I did not suggest that we exclude social interactions from the environment of a free agent; maybe we disagree about how I used the word kickstarts, but I can live with that.
I wrote:
Let’s move to statement 2. The fact that the values of intelligent agents are completely arbitrary is in conflict with the historical trend of moral progress observed so far on Earth, which is far from being a random walk — see [6] for an excellent defence of this point.
Maybe you are interpreting this as saying that it’s a direct contradiction. You could read it as: “Let’s take into account information we can gather from direct observation of humans, which are intelligent social agents: there’s a historical trend bla bla”

Michele Campolo 28 Dec 2023 19:56 UTC
LW: 1 AF: 1
0
AF
in reply to: RogerDearnaley’s comment on: Free agents
Thanks for your thoughts! I am not sure about which of the points you made are more important to you, but I’ll try my best to give you some answers.
Under Further observations, I wrote:
The toy model described in the main body is supposed to be only indicative. I expect that actual implemented agents which work like independent thinkers will be more complex.
If the toy model I gave doesn’t help you, a viable option is to read the post ignoring the toy model and focusing only on natural language text.
Building an agent that is completely free of any bias whatsoever is impossible. I get your point about avoiding a consequentialist bias, but I am not sure it is particularly important here: in theory, the agent could develop a world model and an evaluation $f$ reflecting the fact that value is actually determined by actions instead of world states. Another point of view: let’s say someone builds a very complex agent that at some point in its architecture uses MDPs with reward defined on actions, is this agent going to be biased towards deontology instead of consequentialism? Maybe, but the answer will depend on the other parts of the agent as well.
You wrote:
I agree with these statements, but am unable to deduce from what you say which of these influences, if any, you regard as sources of valid evidence about $f$ as opposed to sources of error. For example, if $f$ is independent of culture (e.g. moral objectivism), then “differences in the learning environment (culture, education system et cetera)” can only induce errors (if perhaps more or less so in some cases than others). But if $f$ is culturally dependent (cultural moral relativism), then cultural influences should generally be expected to be very informative.
It could also be that some basic moral statements are true and independent of culture (e.g. reducing pain for everyone is better than maximising pain for everyone), while others are in conflict with each other and the reached position depends on culture. The research idea is to make experiments in different environments and with different starting biases, and observe the results. Maybe there will be a lot overlap and convergence! Maybe not.
thus that the only valid source for experimental evidence about $f$ is from humans (which would put your Free Agent in a less-informed but more objective position that a human ethical philosopher, unless it were based on an LLM or some other form of AI with some indirect access to human moral intuitions)
I am not sure I completely follow you when you are talking about experimental evidence about $f$ , but the point you wrote in brackets is interesting. I had a similar thought at some point, along the lines of: “if a free agent didn’t have direct access to some ground truth, it might have to rely on human intuitions by virtue of the fact that they are the most reliable intuitions available”. Ideally, I would like to have an agent which is in a more objective position than a human ethical philosopher. In practice, the only efficiently implementable path might be based on LLMs.

Michele Campolo

One more rea­son for AI ca­pa­ble of in­de­pen­dent moral rea­son­ing: al­ign­ment it­self and cause prioritisation

Do­ing good… best?

With enough knowl­edge, any con­scious agent acts morally

Agents that act for rea­sons: a thought experiment

One more reason for AI capable of independent moral reasoning: alignment itself and cause prioritisation

Doing good… best?

With enough knowledge, any conscious agent acts morally

Agents that act for reasons: a thought experiment