David Matolcsi comments on Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents

David Matolcsi 20 Apr 2025 6:42 UTC
6 points
0
Thanks for the reply, I broadly agree with your points here. I agree we should pronably eventually try to do trades across logical counter-factuals. Decreasing logical risk is one good framing for that, but in general, there are just positive trades to be made.
However, I think you are still underestimating how hard it might be to strike these deals. “Be kind to other existing agents” is a natural idea to us, but it’s still unclear to me if it’s something you should assign hogh probability to as a preference of logically counter-factual beings. Sure, there is enough room for humans and mosquitos, but if you relax ‘agent’ and ‘existing’, suddenly there is not enough room for everyone. You can argue that “be kind to existing agents” is plausibly a relatively short description length statement, so it will be among the first guesses of the AI and will allocate at least some fraction of the universe to it. But once trading across logical counter-factuals, I’m not sure you can trust things like description length. Maybe in the logical counter-factual universe, they assign higher value/probability to longer instead of shortet statements, but the measure still ends up to 1, because math works differently.
Similarly, you argue that loving torture is probably rare, based on evolutionary grounds. But logically counter-factual beings weren’t necessarily born through evolution. I have no idea how we should determine the dstribution of logicsl counter-factuals, and I don’t know what fraction enjoys torture in that distribution.
Altogether, I agree logical trade is eventually worth trying, but it will be very hard and confusing and I see a decent chance that it basically won’t work at all.
- Knight Lee 20 Apr 2025 8:20 UTC
  1 point
  0
  Parent
  If one concern is the low specificity of being kind to weaker agents, what do you think about directly trading with Logical Counterfactual Simulations?
  Directly trading with Logical Counterfactual Simulations is very similar to the version by Rolf Nelson (and you): the ASI is directly rewarded for sharing with humans, rather than rewarded for being kind to weaker agents.
  The only part of math and logic that the Logical Counterfactual Simulation alters, is “how likely the ASI succeeds in taking over the world.” This way, the ASI can never be sure that it won (and humans lost), even if math and logic appears to prove that humans have 99.9999% frequency of losing.
  I actually spent more time working on this direct version, but I still haven’t turned it into a proper post (due to procrastination, and figuring out how to convince all the Human-AI Trade skeptics like Nate Soares and Wei Dai).
  - David Matolcsi 21 Apr 2025 21:51 UTC
    2 points
    0
    Parent
    Unclear if we can talk about “humans” in a simulation where logic works differently, but I don’t know, it could work. I remain uncertain how feasible trades across logical counterfactuals will be, it’s all very confusing.