CEO of Convergence, an x-risk research and impact organization.
David_Kristoffersson
Mark: So you think human-level intelligence by principle does not combine with goal stability. Aren’t you simply disagreeing with the orthogonality thesis, “that an artificial intelligence can have any combination of intelligence level and goal”?
There’s no guarantee that boxing will ensure the safety of a soft takeoff. When your boxed AI starts to become drastically smarter than a human -- 10 times --- 1000 times -- 1000000 times—the sheer enormity of the mind may slip out of human possibility to understand. All the while, a seemingly small dissonance between the AI’s goals and human values—or a small misunderstanding on our part of what goals we’ve imbued—could magnify to catastrophe as the power differential between humanity and the AI explodes post-transition.
If an AI goes through the intelligence explosion, its goals will be what orchestrates all resources (as Omohundro’s point 6 implies). If the goals of this AI does not align with human values, all we value will be lost.
So you disagree with the premise of the orthogonality thesis. Then you know a central concept to probe to understand the arguments put forth here. For example, check out Stuart’s Armstrong’s paper: General purpose intelligence: arguing the Orthogonality thesis
And boxing, by the way, means giving the AI zero power.
No, hairyfigment’s answer was entirely appropriate. Zero power would mean zero effect. Any kind of interaction with the universe means some level of power. Perhaps in the future you should say nearly zero power instead so as to avoid misunderstanding on the parts of others, as taking you literally on the “zero” is apparently “legalistic”.
As to the issues with nearly zero power:
A superintelligence with nearly zero power could turn to be a heck of a lot more power than you expect.
The incentives to tap more perceived utility by unboxing the AI or building other unboxed AIs will be huge.
Mind, I’m not arguing that there is anything wrong with boxing. What’s I’m arguing is that it’s wrong to rely only on boxing. I recommend you read some more material on AI boxing and Oracle AI. Don’t miss out on the references.
First, appreciation: I love that calculated modification of self. These, and similar techniques, can be very useful if put to use in the right way. I recognize myself here and there. You did well to abstract it all out this clearly.
Second, a note: You’ve described your techniques from the perspective of how they deviate from epistemic rationality—“Changing your Terminal Goals”, “Intentional Compartmentalization”, “Willful inconsistency”. I would’ve been more inclined to describe them from the perspective of their central effect, e.g. something to the style of: “Subgoal ascension”, “Channeling”, “Embodying”. Perhaps not as marketable to the lesswrong crowd. Multiple perspectives could be used as well.
Third, a question: How did you create that gut feeling of urgency?
Hello.
I’m currently attempting to read through the MIRI research guide in order to contribute to one of the open problems. Starting from Basics. I’m emulating many of Nate’s techniques. I’ll post reviews of material in the research guide at lesswrong as I work through it.
I’m mostly posting here now just to note this. I can be terse at times.
See you there.
You guys must be right. And wikipedia corroborates. I’ll edit the post. Thanks.
My two main sources of confusion in that sentence are:
He says “distinct elements onto distinct elements”, which suggests both injection and surjection.
He says “is called one-to-one (usually a one-to-one correspondence)”, which might suggest that “one-to-one” and “one-to-one correspondence” are synonyms—since that is what he usually uses the parantheses for when naming concepts.
I find Halmos somewhat contradictory here.
But I’m convinced you’re right. I’ve edited the post. Thanks.
The author of the Teach Yourself Logic study guide agrees with you about reading multiple sources:
I very strongly recommend tackling an area of logic (or indeed any new area of mathematics) by reading a series of books which overlap in level (with the next one covering some of the same ground and then pushing on from the previous one), rather than trying to proceed by big leaps.
In fact, I probably can’t stress this advice too much, which is why I am highlighting it here. For this approach will really help to reinforce and deepen understanding as you re-encounter the same material from different angles, with different emphases.
Thanks for the tip. Two other books on the subject that seem to be appreciated are Introduction to Set Theory by Karel Hrbacek and Classic Set Theory: For Guided Independent Study by Derek Goldrei.
Edit: math.se weighs in: http://math.stackexchange.com/a/264277/255573
Interesting. I might show up.
Counterpoint: Sometimes, not moving means moving, because everyone else is moving away from you. Movement—change—is relative. And on the Internet, change is rapid.
It’s bleen, without a moment’s doubt.
Here’s the Less Wrong post for the AI Safety Camp!
Your relationship with other people is a macrocosm of your relationship with yourself.
I think there’s something to that, but it’s not that general. For example, some people can be very kind to others but harsh with themselves. Some people can be cruel to others but lenient to themselves.
If you can’t get something nice, you can at least get something predictable
The desire for the predictable is what Autism Spectrum Disorder is all about, I hear.
Yes—the plan is to have these on an ongoing basis. I’m writing this just as the deadline was passed for the one planned to April.
Here’s the web site: https://aisafetycamp.com/
The facebook is also a good place to keep tabs on it: https://www.facebook.com/groups/348759885529601/
I’m very confused why you think that such research should be done publicly, and why you seem to think it’s not being done privately.
I don’t think the article implies this:
Research should be done publicly
The article states: “We especially encourage researchers to share their strategic insights and considerations in write ups and blog posts, unless they pose information hazards.”
Which means: share more, but don’t share if you think there are possible negative consequences of it.
Though I guess you could mean that it’s very hard to tell what might lead to negative outcomes. This is a good point. This is why we (Convergence) is prioritizing research on information hazard handling and research shaping considerations.it’s not being done privately
The article isn’t saying strategy research isn’t being done privately. What it is saying is that we need more strategy research and should increase investment into it.
Given the first sentence, I’m confused as to why you think that “strategy research” (writ large) is going to be valuable, given our fundamental lack of predictive ability in most of the domains where existential risk is a concern.
We’d argue that to get better predictive ability, we need to do strategy research. Maybe you’re saying the article makes it looks like we are recommending any research that looks like strategy research? This isn’t our intention.
Nice work, Wei Dai! I hope to read more of your posts soon.
However I haven’t gotten much engagement from people who work on strategy professionally. I’m not sure if they just aren’t following LW/AF, or don’t feel comfortable discussing strategically relevant issues in public.
A bit of both, presumably. I would guess a lot of it comes down to incentives, perceived gain, and habits. There’s no particular pressure to discuss on LessWrong or the EA forum. LessWrong isn’t perceived as your main peer group. And if you’re at FHI or OpenAI, you’ll have plenty contact with people who can provide quick feedback already.
My guess is that the ideal is to have semi-independent teams doing research. Independence in order to better explore the space of questions, and some degree of plugging in to each other in order to learn from each other and to coordinate.
Are there serious info hazards, and if so can we avoid them while still having a public discussion about the non-hazardous parts of strategy?
There are info hazards. But I think if we can can discuss Superintelligence publicly, then yes; we can have a public discussion about non-hazardous parts of strategy.
Are there enough people and funding to sustain a parallel public strategy research effort and discussion?
I think you could get a pretty lively discussion even with just 10 people, if they were active enough. I think you’d need a core of active posters and commenters, and there needs to be enough reason for them to assemble.
http://intelligenceexplosion.com/en/2012/ai-the-problem-with-solutions/ links to http://lukeprog.com/SaveTheWorld.html—which redirects to http://lukemuehlhauser.comsavetheworld.html/ - which isn’t there anymore.