I can’t resist giving this pair of rather incongruous quotes from the paper
I can’t resist giving this pair of rather incongruous quotes from the paper
Could you spell out what makes the quotes incongruous with each other? It’s not jumping out at me.
1 billion per year per W/m^2 of reduced forcing
For others who weren’t sure what “reduced forcing” refers to: https://en.wikipedia.org/wiki/Radiative_forcing
And to put that number in context, the “net anthropogenic component” of radiative forcing appears to be about 1.5 W/m^2 (according to an image in the wikipedia article), so canceling out the anthropogenic component would have an ongoing cost of 1.5 billion per year.
Or you could imagine writing for a smarter but less knowledgeable person. E.g. 10 y.o. Feynman.
Okay, that is probably not that good a characterization.
I appreciate the caveat, but I’m actually not seeing the connection at all. What is the relationship you see between common sense and surprisingly simple solutions to problems?
Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?
This seems very related to the question of whether uploads would be safer than some other kind of AGI. Offhand, I remember a comment from Eliezer suggesting that he thought that would be safer (but that uploads would be unlikely to happen first).
Not sure how common that view is though.
Acquiring data: put a group of people in a house with a computer. Show them things (images, videos, audio files, etc.) and give them a chance to respond at the keyboard. Their keyboard actions are the actions, and everything between actions is an observation. Then learn the policy of the group of humans.
Wouldn’t this take an enormous amount of observation time to generate enough data to learn a human-imitating policy?
Just want to note that I like your distinctions between Algorithm Land and the Real World and also between Level-1 optimization and Level-2 optimization.
I think some discussion of AI safety hasn’t been clear enough on what kind of optimization we expect in which domains. At least, it wasn’t clear to me.
But a couple things fell into place for me about 6 months ago, which very much rhyme with your two distinctions:
1) Inexploitability only makes sense relative to a utility function, and if the AI’s utility function is orthogonal to yours (e.g. because it is operating in Algorithm Land), then it may be exploitable relative to your utility function, even though it’s inexploitable relative to its own utility function. See this comment (and thanks to Rohin for the post that prompted the thought).
2) While some process that’s optimizing super-hard for an outcome in Algorithm Land may bleed out into affecting the Real World, this would sort of be by accident, and seems much easier to mitigate than a process that’s trying to affect the Real World on purpose. See this comment.
Putting them together, a randomly selected superintelligence doesn’t care about atoms, or about macroscopic events unfolding through time (roughly the domain of what we care about). And just because we run it on a computer that from our perspective is embedded in this macroscopic world, and that uses macroscopic resources (compute time, energy), doesn’t mean it’s going to start caring about macroscopic Real World events, or start fighting with us for those resources. (At least, not in a Level-2 way.)
On the other hand, powerful computing systems we build are not going to be randomly selected from the space of possible programs. We’ll have economic incentives to create systems that do consider and operate on the Real World.
So it seems to me that a randomly selected superintelligence may not actually be dangerous (because it doesn’t care about being unplugged—that’s a macroscopic concept that seems simple and natural from our perspective, but would not actually correspond to something in most utility functions), but that the superintelligent systems anyone is likely to actually build will be much more likely to be dangerous (because they will model and or act on the Real World).
I see two links in your comment that are both linking to the same place—did you mean for the first one (with the text: “the criticism that the usage of “scam” in the title was an instance of the noncentral fallacy”) to link to something else?
The way I read it, Gwern’s tool-AI article is mostly about self-improvement.
I’m not sure I understand what you mean here. I linked Gwern’s post because your proposal sounded very similar to me to Holden’s Tool AI concept, and Gwern’s post is one of the more comprehensive responses I can remember coming across.
Is it your impression that what you’re proposing is substantially different from Holden’s Tool AI?
When I say that your idea sounded similar, I’m thinking of passages like this (from Holden):
Another way of putting this is that a “tool” has an underlying instruction set that conceptually looks like: “(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc.” An “agent,” by contrast, has an underlying instruction set that conceptually looks like: “(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A.” In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the “tool” version rather than the “agent” version, and this separability is in fact present with most/all modern software. Note that in the “tool” version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter—to describe a program of this kind as “wanting” something is a category error, and there is no reason to expect its step (2) to be deceptive….This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode. In fact, if developing “Friendly AI” is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on “Friendliness theory” moot.
Compared to this (from you):
Finally, we query the system in a way that is compatible with its self-unawareness. For example, if we want to cure cancer, one nice approach would be to program it to search through its generative model and output the least improbable scenario wherein a cure for cancer is discovered somewhere in the world in the next 10 years. Maybe it would output: “A scientist at a university will be testing immune therapy X, and they will combine it with blood therapy Y, and they’ll find that the two together cure all cancers”. Then, we go combine therapies X and Y ourselves.
Your, “Then, we go combine therapies X and Y ourselves.” to me sounds a lot like Holden’s separation of (1) Calculating the best action vs (2) Either explaining (in the case of Tool AI) or executing (in the case of Agent AI) the action. In both cases you seem to be suggesting that we can reap the rewards of superintelligence but retain control by treating the AI as an advisor rather than as an agent who acts on our behalf.
Am I right that what you’re proposing is pretty much along the same lines as Holden’s Tool AI—or is there some key difference that I’m missing?
Also see these discussions of Drexler’s Comprehensive AI Services proposal, which also emphasizes non-agency:
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
Comments on CAIS
If you haven’t already seen it, you might want to check out: https://www.gwern.net/Tool-AI
The Dive in! link in the last paragraph appears to be broken. It’s taking me to: https://www.lesswrong.com/posts/DbZDdupuffc4Xgm7H/%E2%81%A0http://mindingourway.com/dive-in/
Scoring your predictions: it looks like you got all three “not see” predictions right, as well as #1 and #3 from “will see”, with only #2 from “will see” missing (though you had merely predicted we’d see something “closer to” your “will see” list, so missing one doesn’t necessarily mean you were wrong).
But while it may have been sensible to start (fully 10 years ago, now!)
Correction: CFAR was started in 2012 (though I believe some of the founders ran rationality camps the previous summer, in 2011), so it’s been 7 (or 8) years, not 10.
Got it, that makes sense.
But being in a position of power filters for competence, and competence filters for accurate beliefs.
If the quoted bit had instead said:
This means that highly competent people in positions of power often have less accurate beliefs than highly competent people who are not in positions of power.
I wouldn’t necessarily have disagreed. But as is I’m pretty skeptical of the claim (again depending on what is meant by “often”).
This means that highly competent people in positions of power often have less accurate beliefs than much less competent people who are not in positions of power.
Not sure how strong you intend this statement to be (due to ambiguity of “often”), but I would think that all-else-equal, a randomly selected competent person with some measure of power has more accurate beliefs than a less competent person w/o power, even after controlling for e.g. IQ.
Would you disagree with that?
I’d grant that the people with the very most accurate beliefs are probably not the same as the people who are the very most competent, but that’s mostly just because the tails come apart.
I’d also grant that having power subjects one to new biases. But being competent and successful is a strong filter for your beliefs matching reality (at least in some domains, and to the extent that your behavior is determined by your beliefs), while incompetence often seems to go hand-in-hand with various kinds of self-deception (making excuses, blaming others, having unrealistic expectations of what will work or not).
So overall I’d expect the competent person’s beliefs to be more accurate.
I notice I wanted to put ‘dexterous motor control’ on both lists, so I’m somehow confused; it seems like we already have prostheses that perform pretty well based on external nerve sites (like reading off what you wanted to do with your missing hand from nerves in your arm) but I somehow don’t expect us to have the spatial precision or filtering capacity to do that in the brain.
We’ve had prostheses that let people control computer cursors via a connection directly to the brain at least since 2001. Would you not count that as dexterous motor control?
Sure if you just call it “honest reporting”. But that was not the full phrase used. The full phrase used was “honest reporting of unconsciously biased reasoning”.
I would not call trimming that down to “honest reporting” a case of honest reporting! ;-)
If I claim, “Joe says X, and I think he honestly believes that, though his reasoning is likely unconsciously biased here”, then that does not at all seem to me like an endorsement of X, and certainly not a clear endorsement.
Nitpick: “code” (in the computer programming sense) is a mass noun, so you don’t say “codes” to refer to programs or snippets of computer code.