I mean, that makes sense—perhaps more so than it does for Hells, if we allow arbitrarily smart deceptive adversaries—but now I’m wondering if your first sentence is a strawman.
I’m glad Jacob agrees that empowerment could theoretically help arbitrary entities achieve arbitrary goals. (I recall someone who was supposedly great at board games recommending it as a fairly general strategy.) I don’t see how, if empowerment is compatible with almost any goal, it could prevent the AI from changing our goals whenever this is convenient.
Perhaps he thinks we can define “empowerment” to exclude this? Quick reaction: that seems likely to be FAI-complete, and somewhat unlikely to be a fruitful approach. My understanding of physics says that pretty much action has a physical effect on our brains. Therefore, the definition of which changes to our brains “empower” and which “disempower” us, may be doing all of the heavy lifting. How does this become easier to program than CEV?
Jacob responds: The distribution shift from humans born in 0AD to humans born in 2000AD seems fairly inconsequential for human alignment.
I now have additional questions. The above seems likely enough in the context of CEV (again), but otherwise false.
>FDT has bigger problems then that.
Does it. The post you linked does nothing to support that claim, and I don’t think you’ve presented any actual problem which definitively wouldn’t be solved by logical counterfactuals. (Would this problem also apply to real people killing terrorists, instead of giving in to their demands? Because zero percent of the people obeying FDT in that regard are doing so because they think they might not be real.) This post is actually about TDT, but it’s unclear to me why the ideas couldn’t be transferred.
I also note that 100% of responses in this thread, so far, appear to assume that your ghosts would need to have qualia in order for the argument to make sense. I think your predictions were bad. I think you should stop doing that, and concentrate on the object-level ideas.
Again, it isn’t more resilient, and thinking you doubt a concept you call “qualia” doesn’t mean you can doubt your own qualia. Perhaps the more important point here is that you are typically more uncertain of mathematical statements, which is why you haven’t removed and cannot remove the need for logical counterfactuals.
Real humans have some degree of uncertainty about most mathematical theorems. There may be exceptions, like 0+1=1, or the halting problem and its application to God, but typically we have enough uncertainty when it comes to mathematics, that we might need to consider counterfactuals. Indeed, this seems to be required by the theorem alluded to at the above link—logical omniscience seems logically impossible.
For a concrete (though unimportant) example of how regular people might use such counterfactuals in everyday life, consider P=NP. That statement is likely false. Yet, we can ask meaningful-sounding questions about what its truth would mean, and even say that the episode of ‘Elementary’ which dealt with that question made unjustified leaps. “Even if someone did prove P=NP,” I find myself reasoning, “that wouldn’t automatically entail what they’re claiming.”
Tell me if I’ve misunderstood, but it sounds like you’re claiming we can’t do something which we plainly do all the time. That is unconvincing. It doesn’t get any more convincing when you add that maybe my experience of doing so isn’t real. I am very confident that you will convince zero average people by telling them that they might not actually be conscious. I’m skeptical that even a philosopher would swallow that.
If you think you might not have qualia, then by definition you don’t have qualia. This just seems like a restatement of the idea that we should act as if we were choosing the output of a computation. On its face, this is at least as likely to be coherent as ‘What if the claim we have the most certainty of were false,’ because the whole point of counterfactuals in general is to screen off potential contradictions.
The problem arises because, for some reason, you’ve assumed the ghosts have qualia. Now, that might be a necessary assumption if you require us to be uncertain about our degree of ghostliness. Necessary or not, though, it seems both dubious and potentially fatal to the whole argument.
That is indeed somewhat similar to the “Hansonian adjustment” approach to solving the Mugging, when larger numbers come into play. Hanson originally suggested that, conditional on the claim that 3^^^^3 distinct people will come into existence, we should need a lot of evidence to convince us we’re the one with a unique opportunity to determine almost all of their fates. It seems like such claims should be penalized by a factor of 1/3^^^^3. We can perhaps extend this so it applies to causal nodes as well as people. That idea seems more promising to me than bounded utility, which implies that even a selfish agent would be unable to share many goals with its future self (and technically, even a simple expected value calculation takes time.)
Your numbers above are, at least, more credible than saying there’s a 1⁄512 chance someone will offer you a chance to pick between a billion US dollars and one hundred million.
I may reply to this more fully, but first I’d like you to acknowledge that you cannot in fact point to a false prediction by EY here, and in the exact post you seemed to be referring to, he says that his view is compatible with this sort of AI producing realistic sculptures of human faces!
Maybe I don’t understand the point of this example in which AI creates non-conscious images of smiling faces. Are you really arguing that, based on evidence like this, a generalization of modern AI wouldn’t automatically produce horrific or deadly results when asked to copy human values?
Peripherally: that video contains simulacra of a lot more than faces, and I may have other minor objections in that vein.
ETA, I may want to say more about the actual human analysis which I think informed the AI’s “success,” but first let me go back to what I said about linking EY’s actual words. Here is 2008-Eliezer:
Now you, finally presented with a tiny molecular smiley—or perhaps a very realistic tiny sculpture of a human face—know at once that this is not what you want to count as a smile. But that judgment reflects an unnatural category, one whose classification boundary depends sensitively on your complicated values. It is your own plans and desires that are at work when you say “No!”Hibbard knows instinctively that a tiny molecular smileyface isn’t a “smile”, because he knows that’s not what he wants his putative AI to do. If someone else were presented with a different task, like classifying artworks, they might feel that the Mona Lisa was obviously smiling—as opposed to frowning, say—even though it’s only paint.
Now you, finally presented with a tiny molecular smiley—or perhaps a very realistic tiny sculpture of a human face—know at once that this is not what you want to count as a smile. But that judgment reflects an unnatural category, one whose classification boundary depends sensitively on your complicated values. It is your own plans and desires that are at work when you say “No!”
Hibbard knows instinctively that a tiny molecular smileyface isn’t a “smile”, because he knows that’s not what he wants his putative AI to do. If someone else were presented with a different task, like classifying artworks, they might feel that the Mona Lisa was obviously smiling—as opposed to frowning, say—even though it’s only paint.
without inevitably failling by instead only producing superficial simulacra of faces
That’s clearly exactly what it does today? It seems I disagree with your point on a more basic level than expected.
See, MIRI in the past has sounded dangerously optimistic to me on that score. While I thought EY sounded more sensible than the people pushing genetic enhancement of humans, it’s only now that I find his presence reassuring, thanks in part to the ongoing story he’s been writing. Otherwise I might be yelling at MIRI to be more pessimistic about fragility of value, especially with regard to people who might wind up in possession of a corrigible ‘Tool AI’.
>DL did not fail in the way EY predicted,
Where’s the link for that prediction, because I think there’s more than one example of critics putting words in his mouth, and then citing a place where he says something manifestly different.
Here’s a post from 2008, where he says the following:
As a matter of fact, if you use the right kind of neural network units, this “neural network” ends up exactly, mathematically equivalent to Naive Bayes. The central unit just needs a logistic threshold—an S-curve response—and the weights of the inputs just need to match the logarithms of the likelihood ratios, etcetera. In fact, it’s a good guess that this is one of the reasons why logistic response often works so well in neural networks—it lets the algorithm sneak in a little Bayesian reasoning while the designers aren’t looking.Just because someone is presenting you with an algorithm that they call a “neural network” with buzzwords like “scruffy” and “emergent” plastered all over it, disclaiming proudly that they have no idea how the learned network works—well, don’t assume that their little AI algorithm really is Beyond the Realms of Logic. For this paradigm of adhockery , if it works, will turn out to have Bayesian structure; it may even be exactly equivalent to an algorithm of the sort called “Bayesian”.
As a matter of fact, if you use the right kind of neural network units, this “neural network” ends up exactly, mathematically equivalent to Naive Bayes. The central unit just needs a logistic threshold—an S-curve response—and the weights of the inputs just need to match the logarithms of the likelihood ratios, etcetera. In fact, it’s a good guess that this is one of the reasons why logistic response often works so well in neural networks—it lets the algorithm sneak in a little Bayesian reasoning while the designers aren’t looking.
Just because someone is presenting you with an algorithm that they call a “neural network” with buzzwords like “scruffy” and “emergent” plastered all over it, disclaiming proudly that they have no idea how the learned network works—well, don’t assume that their little AI algorithm really is Beyond the Realms of Logic. For this paradigm of adhockery , if it works, will turn out to have Bayesian structure; it may even be exactly equivalent to an algorithm of the sort called “Bayesian”.
In a discussion from 2010, he’s offered the chance to say that he doesn’t think the machine learning of the time could produce AGI even with a smarter approach, and he appears to pull back from saying that:
But if we’re asking about works that are sort of billing themselves as ‘I am Artificial General Intelligence’, then I would say that most of that does indeed fail immediately and indeed I cannot think of a counterexample which fails to fail immediately, but that’s a sort of extreme selection effect, and it’s because if you’ve got a good partial solution, or solution to a piece of the problem, and you’re an academic working in AI, and you’re anything like sane, you’re just going to bill it as plain old AI, and not take the reputational hit from AGI. The people who are bannering themselves around as AGI tend to be people who think they’ve solved the whole problem, and of course they’re mistaken. So to me it really seems like to say that all the things I’ve read on AGI immediately fundamentally fail is not even so much a critique of AI as rather a comment on what sort of more tends to bill itself as Artificial General Intelligence.
If we somehow produced the sort of AI which EY wants, then I think you’d have radically underestimated the chance of being reconstructed from cryonically preserved data.
On the other side, you appear comfortable talking about 4 more decades of potential life, which is rather longer than the maximum I can believe in for myself in the absence of a positive singularity. I may also disagree with your take on selfishness, but that isn’t even the crux here! Set aside the fact that, in my view, AGI is likely to kill everyone in much less than 40 years from today. Even ignoring that, you would have to be overstating the case when you dismiss “the downfall of society,” because obviously that kills you in less than 4 decades with certainty. Nor is AGI the only X-risk we have to worry about.
What constitutes pessimism about morality, and why do you think that one fits Eliezer? He certainly appears more pessimistic across a broad area, and has hinted at concrete arguments for being so.
>The American Maritime Partnership — a coalition that represents operators of U.S.-flagged vessels and unions covered by the Jones Act —
So, not a union. The guy in charge, who is quoted here, appears to still be a VP at Matson Navigation—a 2.38 billion dollar business which had 4149 employees in 2020. The next named officer at AMP, if you look them up, is the CEO of American Waterways Operators. “Organized in Washington, D.C. in 1944, AWO now has over 300 member companies that serve the diverse needs of U.S. shippers and consumers.” The third and final named officer is a VP at the Transportation Institute, which is devoted to preserving the Jones Act. TI appears to be run entirely by CEOs and corporate chairs or presidents.
None of that is secret, because nothing is ever secret anymore.
>What I mean, instead, is that the unions should, for selfish reasons, not want this.
You don’t say.
There is at least one union strongly opposed to the waiver. They’re called the American Maritime Officers (meaning, all their members have spiffy titles.) Whereas the largest union in the AFL-CIO has 1.7 million members, the AMO has less than 4000.
I read it and didn’t know what to make of it, since you sketch out some of the reasons why we obviously don’t live in a simulation. One man’s modus ponens is another man’s modus tollens.
Creating a universe like our own would be a crime unprecedented in history. If I thought you could do it, I’m not saying I’d do whatever it took to prevent you—but if someone else killed you for it, and if I were inexplicably placed on the jury, I’d prevent a conviction. Hopefully enough other beings think the same way—and again, you present an argument that they would—to rule out the possibility of such an abomination.
Yeah, that concept is literally just “harmful info,” which takes no more syllables to say than “infohazard,” and barely takes more letters to write. Please do not use the specialized term if your actual meaning is captured by the English term, the one which most people would understand immediately.
I will say that not everything which ends is a mistake, but that should not be taken to endorse having children—you’re already pregnant, aren’t you.
Isn’t that just conflation of training data with fundamental program design? I’m no expert, but my impression is that you could train GPT-1 all you want and it would never become GPT-3.
Addendum: I don’t think we should be able to prove that Life Gliders lack values, merely because they have none. That might sound credible, but it may also violate the Von Neumann-Morgenstern Utility Theorem. Or did you mean we should be able to prove it from analyzing their actual causal structure, not just by looking at behavior?
Even then, while the fact that gliders appear to lack values does happen to be connected to their lack of qualia or “internal experience,” those look like logically distinct concepts. I’m not sure where you’re going with this.