I have a vague intuition there’s something interesting that could happen with self-modifying AIs with creator and successor states knowably running on error-prone hardware while having pseudo-universal hypothesis generators that will of course notice the possibility of values corruption. I guess I’m still rooting for the ‘infinite reflection = contextually perfect morality’ deus ex machina. Utility functions as they’re normally idealized for imagining superintelligence behavior like in Basic AI Drives look an awful lot like self-protecting beliefs, which feels more and more decision theoretically wrong as time goes on. I trust the applicability of the symbols of expected utility theory less over time and trust common beliefs about the automatic implications of putting those symbols in a seed AI even less than that. Am I alone here?
The reason I am not attempting to tackle those problems is because I hang out with Steve Rayhawk and assume that if I was going to make any progress I’d have to be roughly as smart and knowledgeable as Steve Rayhawk, ’cuz if he hasn’t solved something yet that means I’d have to be smarter than him to solve it. I subconsciously intuit that as impossible so I try to specialize in pulling on less mathy yarns instead, which is actually a lot more possible than I’d anticipated but took me a long time to get passable at.
I trust the applicability of the symbols of expected utility theory less over time and trust common beliefs about the automatic implications of putting those symbols in a seed AI even less than that. Am I alone here?
The current theory is all fine—until you want to calculate utility based on something other than expected sensory input data. Then the current theory doesn’t work very well at all. The problem is that we don’t yet know how to code: “not what you are seeing, how the world really is” in a machine-readable format.
I have a vague intuition there’s something interesting that could happen with self-modifying AIs with creator and successor states knowably running on error-prone hardware while having pseudo-universal hypothesis generators that will of course notice the possibility of values corruption. I guess I’m still rooting for the ‘infinite reflection = contextually perfect morality’ deus ex machina. Utility functions as they’re normally idealized for imagining superintelligence behavior like in Basic AI Drives look an awful lot like self-protecting beliefs, which feels more and more decision theoretically wrong as time goes on. I trust the applicability of the symbols of expected utility theory less over time and trust common beliefs about the automatic implications of putting those symbols in a seed AI even less than that. Am I alone here?
The reason I am not attempting to tackle those problems is because I hang out with Steve Rayhawk and assume that if I was going to make any progress I’d have to be roughly as smart and knowledgeable as Steve Rayhawk, ’cuz if he hasn’t solved something yet that means I’d have to be smarter than him to solve it. I subconsciously intuit that as impossible so I try to specialize in pulling on less mathy yarns instead, which is actually a lot more possible than I’d anticipated but took me a long time to get passable at.
The current theory is all fine—until you want to calculate utility based on something other than expected sensory input data. Then the current theory doesn’t work very well at all. The problem is that we don’t yet know how to code: “not what you are seeing, how the world really is” in a machine-readable format.