MinusGix comments on Thane Ruthenis’s Shortform

MinusGix 17 Sep 2025 14:43 UTC
3 points
0
(Note: I’ve only read a few pages so far, so perhaps this is already in the background)

I agree that if the parent comment scenario holds then it is a case of the upload being improper.

However, I also disagree that most humans naturally generalize our values out of distribution. I think it is very easy for many humans to get sucked into attractors (ideologies that are simplifications of what they truly want; easy lies; the amount of effort ahead stalling out focus even if the gargantuan task would be worth it) that damage their ability to properly generalize and also importantly apply their values. That is, humans have predictable flaws. Then when you add in self-modification you open up whole new regimes.

My view is that a very important element of our values is that we do not necessarily endorse all of our behaviors!

I think a smart and self-aware human could sidestep and weaken these issues, but I do think they’re still hard problems. Which is why I’m a fan of (if we get uploads) going “Upload, figure out AI alignment, then have the AI think long and hard about it” as that further sidesteps problems of a human staring too long at the sun. That is, I think it is very hard for a human to directly implement something like CEV themselves, but that a designed mind doesn’t necessarily have the same issues.

As an example: power-seeking instinct. I don’t endorse seeking power in that way, especially if uploaded to try to solve alignment for Humanity in general, but given my status as an upload and lots of time realizing that I have a lot of influence over the world, I think it is plausible that instinct affects me more and more. I would try to plan around this but likely do so imperfectly.
- TAG 26 Sep 2025 13:53 UTC
  2 points
  0
  Parent
  
  However, I also disagree that most humans naturally generalize our values out of distribution
  
  That’s somewhere between wholly true and wholly false People don’t have a unique set of values , or a fixed set of values, so can update, but not without error.
  
  However, I also disagree that most humans naturally generalize our values out of distribution
  
  Yes, updating is generational , as @jdp already said.