“Smart entities will be coherent relative to what they care about”,
“Coherent entities can be seen as optimizing expected utility for some utility function”
“EU maximizers are dangerous.”
I think both (1) and (3) are sketchy/wrong/weird.
(1) There’s a step like “Don’t you want to save as many lives as possible? Then you have to coherently trade off opportunities by assigning a value to each life.” and the idea that this kind of reasoning then pins down “you now maximize, or approximately maximize, or want to maximize, some utility function over all universe-histories.” This is just a huge leap IMO.
Also, I think that people mostly just imagine specific kinds of EU maximizers (e.g. over action-observation histories) with simple utility functions (e.g. one we could program into a simple Turing machine, and then hand to AIXI). And people remember all the scary hypotheticals where AIXI wireheads, or Eliezer’s (hypothetical) example of an outcome-pump. I think that people think “it’ll be an EU maximizer” and remember AIXI and conclude “unalignable” or “squeezes the future into a tiny weird contorted shape unless the utility function is perfectly aligned with what we care about.” My imagined person acknowledges “mesa optimizers won’t be just like AIXI, but I don’t see a reason to think they’ll be fundamentally differently structured in the limit.”
On these perceptions of what happens in common reasoning about these issues, I think there is just an enormous number of invalid reasoning steps, and I can tell people about a few of them but—even if I make myself understood—there usually don’t seem to be internal errors thrown which leads to a desperate effort to recheck other conclusions and ideas drawn from invalid steps. EU-maxing and its assumptions seep into a range of alignment concepts (including exhaustive search as a plausible idealization of agency). On my perceptions, even if someone agrees that a specific concept (like exhaustive search) is inappropriate, they don’t seem to roll back belief-updates they made on the basis of that concept.
My current stance is “IDK what the AI cognition will look like in the end”, and I’m trying not to collapse my uncertainty prematurely.
Separately from Scott’s answer, if people reason
“Smart entities will be coherent relative to what they care about”,
“Coherent entities can be seen as optimizing expected utility for some utility function”
“EU maximizers are dangerous.”
I think both (1) and (3) are sketchy/wrong/weird.
(1) There’s a step like “Don’t you want to save as many lives as possible? Then you have to coherently trade off opportunities by assigning a value to each life.” and the idea that this kind of reasoning then pins down “you now maximize, or approximately maximize, or want to maximize, some utility function over all universe-histories.” This is just a huge leap IMO.
(3) We don’t know what the entities care about, or even that what they care about cleanly maps onto tileable, mass-producible, space-time additive quantities like “# of diamonds produced.”
Also, I think that people mostly just imagine specific kinds of EU maximizers (e.g. over action-observation histories) with simple utility functions (e.g. one we could program into a simple Turing machine, and then hand to AIXI). And people remember all the scary hypotheticals where AIXI wireheads, or Eliezer’s (hypothetical) example of an outcome-pump. I think that people think “it’ll be an EU maximizer” and remember AIXI and conclude “unalignable” or “squeezes the future into a tiny weird contorted shape unless the utility function is perfectly aligned with what we care about.” My imagined person acknowledges “mesa optimizers won’t be just like AIXI, but I don’t see a reason to think they’ll be fundamentally differently structured in the limit.”
On these perceptions of what happens in common reasoning about these issues, I think there is just an enormous number of invalid reasoning steps, and I can tell people about a few of them but—even if I make myself understood—there usually don’t seem to be internal errors thrown which leads to a desperate effort to recheck other conclusions and ideas drawn from invalid steps. EU-maxing and its assumptions seep into a range of alignment concepts (including exhaustive search as a plausible idealization of agency). On my perceptions, even if someone agrees that a specific concept (like exhaustive search) is inappropriate, they don’t seem to roll back belief-updates they made on the basis of that concept.
My current stance is “IDK what the AI cognition will look like in the end”, and I’m trying not to collapse my uncertainty prematurely.