I was hoping someone would go ahead and try this. Great work, love it.
I think Eliezer Yudkowsky’s argument still has some merit, even if some people actually enjoy bear fat with honey and salt flakes more than ice cream
Hm, I think that specific argument falls through in that case. Suppose humans indeed like BFWHASF more than ice cream, but mostly eat the latter due to practical constraints. That means that, once we become more powerful and those constraints fall away, we would switch over to BFWHASF. But that’s actually the regime in which we’re not supposed to be able to predict what an agent would do!
As the argument goes, in a constrained environment with limited options (such as when a relatively stupid AGI is still trapped in our civilization’s fabric), an agent might appear to pursue values it was naively shaped to have. But once it grows more powerful, and gets the ability to take what it really wants, it would go for some weird unpredictable edge-instantiation thing. The suggested “BFWHASF vs. ice cream” case would actually be the inverse of that.
The core argument should still hold:
For one, BFWHASF is surely not the optimal food. Once we have the ability to engineer arbitrary food items, and also modify our taste buds how we see fit, we would likely go for something much more weird.
The quintessential example would of course be us getting rid of the physical implementation of food altogether, and instead focusing on optimizing substrate-independent (e. g., simulated) food-eating experiences (ones not involving even simulated biology). That maps to “humans are killed and replaced with chatbots (not even uploaded humans) babbling nice things” or whatever.
Even people who prefer “natural” food would likely go for e. g. the meat of beasts from carefully designed environments with very fast evolutionary loops set up to make their meat structured for maximum tastiness.[1] Translated to ASI and humans, this suggests an All Tomorrows kind of future (except without Qu going away).
But I do think BFWHASF doesn’t end up as a very good illustration of this point, if humans indeed like it a lot.
Just tune the implementation details of all of this such that the pipeline still meets those people’s aesthetic preferences for “natural-ness”. E. g., note that people who want “real meat” today are perfectly fine with eating the meat of selectively bred beasts.
Rejecting such things as this based on coherent principles is a core part of my post-rationalist optimizing-my-actual-life principles.
The quintessential example would of course be us getting rid of the physical implementation of food altogether, and instead focusing on optimizing substrate-independent (e. g., simulated) food-eating experiences (ones not involving even simulated biology).
Ways to think of it are (1) “grounding one’s Loebian Risks in agent shapes that are closer to being well founded” or (2) “optimizing for the tastes of children under a no-superstimulus constraint” or (3) “don’t do drugs; drugs are bad; m’kay?” or MAYBE (4) “apply your virtue ethics such as to be the ancestor whose psycho-evoutionary spandrels have the most potential to generate interestingly valuable hard-patches in later better minds”.
More tenuously maybe (5) “reformulating subconscious neurological values as semantic claims and grounding the semantics of one’s feelings in engineering concerns so has to avoid accusations of wire-heading and/or lotus-eating and/or mere hedonism”? Like consider the Stoic approach to preference and goodness in general. They reserve “good” for things deemed preferable as a well formed choice, and then say that the only universally safe choice is to choose “wisdom” and so only wisdom is Good to them. But then for ALL THE OTHER STUFF that is “naturally” and “naively” called “good” a lot of it is objectively “oikion”. (This word has the same root as “ecology” (oikology?) and “economics” (oikonomics?).)
Like vitamin C is oikion (naturally familiarly helpful in almost all cases) to humans because otherwise: scurvy. And a wise person can easily see that scurvy is convergently unhelpful to most goals that a human might wisely choose to pursue. NOT ALL GOALS. At least according to the Stoics, they could only find ONE thing that was ALWAYS helpful (and deserved to be called “Good” instead of being called “Oikion”) which was Wisdom Itself.
If vitamin C consumption is oikion, then it might help and probably wouldn’t hurt to make the consumption of vitamin C pleasant to the human palate. But a stoic sage would eat it whether it was pleasant or not, and (given transhuman self modification powers) would make it subjectively pleasant to eat only upon careful and wise consideration (taking other options into account, perhaps, such as simply adding vitamin C synthesis back into out genome via copypasta from other mammals or perhaps by repairing the broken primate GULO pseudogene and seeing what happens (the link is to a creationist, but I kinda love their writing because they really dig DEEP into details precisely so they can try to creatively explain all the details away as an elaborate performance of faithful intellectual obeisance to a literal interpretation of their ancient religion (the collection of true details are great even if the mythic literary analysis and scientific summaries are weak))).
...
From my perspective, there is a semantic vector here that all of these ways of saying “don’t wirehead” are attempting to point at.
It links to math and myth and evolution and science fiction and child psychology and a non-trivial chunk of moral psychology/philosophy talk from before 2015 or so can barely talk about it, but ASSUMES that it won’t even be a problem.
It also links to the grue/bleen problem and attempts to “solve” the problem of “semantics” … where like in some sense you would simply want the entire instruction to an ASI do simply be “DO GOOD” (but with the DWIM instruction correctly implemented somehow). Likewise, using the same software, you might wish that a mind simply felt better when things were “MORE GOOD” and felt sadder when things were “LESS GOOD” after the mind had fully subconsciously solved the entire semantic challenge of defining “GOODNESS” once and for all <3
Even people who prefer “natural” food would likely go for e. g. the meat of beasts from carefully designed environments with very fast evolutionary loops set up to make their meat structured for maximum tastiness.[1]
I don’t think there are many people who “prefer natural food” who would consider “you evolved them very carefully” to really satisfy their cruxes. (Which is not to claim their cruxes are coherent)
I was hoping someone would go ahead and try this. Great work, love it.
Hm, I think that specific argument falls through in that case. Suppose humans indeed like BFWHASF more than ice cream, but mostly eat the latter due to practical constraints. That means that, once we become more powerful and those constraints fall away, we would switch over to BFWHASF. But that’s actually the regime in which we’re not supposed to be able to predict what an agent would do!
As the argument goes, in a constrained environment with limited options (such as when a relatively stupid AGI is still trapped in our civilization’s fabric), an agent might appear to pursue values it was naively shaped to have. But once it grows more powerful, and gets the ability to take what it really wants, it would go for some weird unpredictable edge-instantiation thing. The suggested “BFWHASF vs. ice cream” case would actually be the inverse of that.
The core argument should still hold:
For one, BFWHASF is surely not the optimal food. Once we have the ability to engineer arbitrary food items, and also modify our taste buds how we see fit, we would likely go for something much more weird.
The quintessential example would of course be us getting rid of the physical implementation of food altogether, and instead focusing on optimizing substrate-independent (e. g., simulated) food-eating experiences (ones not involving even simulated biology). That maps to “humans are killed and replaced with chatbots (not even uploaded humans) babbling nice things” or whatever.
Even people who prefer “natural” food would likely go for e. g. the meat of beasts from carefully designed environments with very fast evolutionary loops set up to make their meat structured for maximum tastiness.[1] Translated to ASI and humans, this suggests an All Tomorrows kind of future (except without Qu going away).
But I do think BFWHASF doesn’t end up as a very good illustration of this point, if humans indeed like it a lot.
Just tune the implementation details of all of this such that the pipeline still meets those people’s aesthetic preferences for “natural-ness”. E. g., note that people who want “real meat” today are perfectly fine with eating the meat of selectively bred beasts.
Rejecting such things as this based on coherent principles is a core part of my post-rationalist optimizing-my-actual-life principles.
Ways to think of it are (1) “grounding one’s Loebian Risks in agent shapes that are closer to being well founded” or (2) “optimizing for the tastes of children under a no-superstimulus constraint” or (3) “don’t do drugs; drugs are bad; m’kay?” or MAYBE (4) “apply your virtue ethics such as to be the ancestor whose psycho-evoutionary spandrels have the most potential to generate interestingly valuable hard-patches in later better minds”.
More tenuously maybe (5) “reformulating subconscious neurological values as semantic claims and grounding the semantics of one’s feelings in engineering concerns so has to avoid accusations of wire-heading and/or lotus-eating and/or mere hedonism”? Like consider the Stoic approach to preference and goodness in general. They reserve “good” for things deemed preferable as a well formed choice, and then say that the only universally safe choice is to choose “wisdom” and so only wisdom is Good to them. But then for ALL THE OTHER STUFF that is “naturally” and “naively” called “good” a lot of it is objectively “oikion”. (This word has the same root as “ecology” (oikology?) and “economics” (oikonomics?).)
Like vitamin C is oikion (naturally familiarly helpful in almost all cases) to humans because otherwise: scurvy. And a wise person can easily see that scurvy is convergently unhelpful to most goals that a human might wisely choose to pursue. NOT ALL GOALS. At least according to the Stoics, they could only find ONE thing that was ALWAYS helpful (and deserved to be called “Good” instead of being called “Oikion”) which was Wisdom Itself.
If vitamin C consumption is oikion, then it might help and probably wouldn’t hurt to make the consumption of vitamin C pleasant to the human palate. But a stoic sage would eat it whether it was pleasant or not, and (given transhuman self modification powers) would make it subjectively pleasant to eat only upon careful and wise consideration (taking other options into account, perhaps, such as simply adding vitamin C synthesis back into out genome via copypasta from other mammals or perhaps by repairing the broken primate GULO pseudogene and seeing what happens (the link is to a creationist, but I kinda love their writing because they really dig DEEP into details precisely so they can try to creatively explain all the details away as an elaborate performance of faithful intellectual obeisance to a literal interpretation of their ancient religion (the collection of true details are great even if the mythic literary analysis and scientific summaries are weak))).
...
From my perspective, there is a semantic vector here that all of these ways of saying “don’t wirehead” are attempting to point at.
It links to math and myth and evolution and science fiction and child psychology and a non-trivial chunk of moral psychology/philosophy talk from before 2015 or so can barely talk about it, but ASSUMES that it won’t even be a problem.
You see awareness of the semantic vector in life advice sometimes that resonates with creative rationalist types… It includes trying to “go from 0 to 1” while in contact with real/new/interesting constraints to generate novel processes or concepts that are worthy of repetition. Also “playing in hard mode”. Also Eliezer’s entire concept-network bundled under the Project Lawful concept based on the seeds one can find in the Pathfinder Universe God Irori.
It also links to the grue/bleen problem and attempts to “solve” the problem of “semantics” … where like in some sense you would simply want the entire instruction to an ASI do simply be “DO GOOD” (but with the DWIM instruction correctly implemented somehow). Likewise, using the same software, you might wish that a mind simply felt better when things were “MORE GOOD” and felt sadder when things were “LESS GOOD” after the mind had fully subconsciously solved the entire semantic challenge of defining “GOODNESS” once and for all <3
I don’t think there are many people who “prefer natural food” who would consider “you evolved them very carefully” to really satisfy their cruxes. (Which is not to claim their cruxes are coherent)