Raphael Roche comments on Alignment as uploading with more steps

Raphael Roche 15 Sep 2025 22:45 UTC
1 point
0
I do not believe in “human values.” That is Platonism. I only believe in practical single-to-single alignment and I only advocate single-to-single alignment.
A bit-perfect upload would require an extremely fine-grained scan of the brain, potentially down to the atomic scale. It would be lossless and perfectly aligned but computationally intractable even for one individual.
However, as envisionned in your post, one of the most promising approaches to achieving a reasonably effective emulation (a form of lossy compression) of a human mind would be through reinforcement learning applied to a neural network.
I am quite convinced that, given a sufficiently large volume of conversations across a wide range of topics, along with access to resources such as an autobiography or at least diaries, photo albums, and similar personal documents, present frontier LLMs equipped with a well-crafted prompt could already emulate you or me to a certain degree of accuracy.
A dedicated network specifically trained for this purpose would likely perform better still, and could be seen as a form of lossy mind uploading.
Yet if one can train a network to emulate a single individual, nothing prevents us from training a model to emulate multiple individuals. In theory, one could extend this to the entire human population, resulting in a neural network that emulates humanity as a whole and thereby achieves a form of alignment with human values. Such a system would effectively encode a lossy compression of human values, without anything platonic. Or maybe the ideal form would correspond to the representation in the vector space.
- Cole Wyeth 15 Sep 2025 23:08 UTC
  2 points
  0
  Parent
  A simulation of all humans does not automatically have “human values.” It doesn’t really have values at all. You have to extract consensus values somehow, and in order to do that, you need to specify something like a voting mechanism. But humans don’t form values in a vacuum, and such a simulation also probably needs to set interaction protocols, and governance protocols, and whatever you end up with seems quite path dependent and arbitrary.
  Why not just align AI’s to each individual human and let them work it out?
  - Raphael Roche 15 Sep 2025 23:32 UTC
    1 point
    0
    Parent
    I don’t have any certitude, but I would say that the representation in the neural network is somehow compressed following a logic that emerges from the training. There is something holistic in the process. Maybe a little like the notion of general interest in Rousseau’s social contract, a combination of vectors.
    But if you create as many different networks as humans, you rely on the confrontation of all these systems at the risk that some takeover, just like the dictators we often get in real life. Would it be better, I don’t know. One thing is certain, it would need more compute power, because the redundancy of networks would result in less global compression.