Dagon comments on Richard Ngo’s Shortform

Dagon 25 May 2022 19:35 UTC
2 points
0
Fundamentally, humans aren’t VNM-rational, and don’t actually have utility functions. Which makes the thought experiment much less fun. If you recast it as “what if a human brain’s reinforcement mechanisms were reversed”, I suspect it’s also boring: simple early death.
The interesting fictional cases are when some subset of a person’s legible motivations are reversed, but the mass of other drives remain. This very loosely maps to reversing terminal goals and re-calculating instrumental goals—they may reverse, stay, or change in weird ways.
The indirection case is solved (or rather unasked) by inserting a “perceived” in the calculation chain. Your goals don’t depend on similarity to you, they depend on your perception (or projection) of similarity to you.