I am currently a nuclear engineer with a focus in nuclear plant safety and probabilistic risk assessment. I am also an aspiring EA, interested in X-risk mitigation and the intersection of science and policy.
ErickBall(Erick Ball)
I found this aspect of the topic particularly interesting because it elucidates the main requirement of a question, which I’d never thought of before: a theory of mind.
My cats ask me for food all the time… but this isn’t really a question, it’s a demand. Similarly, when they seek out information, it’s always a solitary endeavor. The closest they might come to an interaction with a human (or another cat) specifically for the purpose of gaining information, would be approaching or meowing with the presumed intention of provoking a reaction that illustrates the other’s mood. Even then it’s more like “try it and see what happens” rather than a cooperative communication. I don’t think they can conceive of another entity possessing information and being capable of sharing it.
Would love to hear of any counterexamples, though.
The concepts discussed here remind me of a book I read recently called “The Cure: Enterprise Medicine for Business”. It’s in the format of a novel, from the persectives of several different characters involved in a business that makes (unspecified) widgets, and I found it to be a page-turner. I think using a fictional example helps to make a lot of things explicit that would otherwise be kind of vague, or where the author might assume the reader knows what they’re talking about, and the first half gives some great insight into what a poorly-functioning company can look like.
The central recommendation is similar to what you describe from An Everyone Culture, except that the emphasis on radical communication doesn’t include personal stuff. The main “trick” it gives for making the whole organization work is that the top management has to buy in to the extreme-honesty company-first mentality and then continually force it on everyone else until it’s universally accepted, with special attention to discovering and removing any stubborn manager who wants to protect their own turf or play power games. It claims to be based on the famously effective management system that GE used. Having little experience of corporations myself, I can’t say whether it’s a realistic approach, but the whole thing struck me as a little too neat and tidy—if it were that easy, wouldn’t everybody be doing it already?
I think what’s being called “TFTWF” here is what some other places call “Tit for Two Tats”, that is, it defects in response to two defections in a row.
But, like, how do you actually do that? I make three times what I did in grad school, but somehow it doesn’t feel like my standard of living has changed much, and I still basically spend everything I make...
I guess the problem is that “consumptive patterns” can be sneaky, and sometimes you didn’t notice they were there all along. The rent doubled because I moved to a city, even though my apartment’s not much nicer; my cell phone is no longer on a family plan; my parents no longer buy me plane tickets home for Christmas; I take the train to work every day. Maybe the cat gets sick and suddenly there are vet bills. In other words, nothing that feels like much of a change in consumption, yet the expenses keep going up.
And then there are a bunch of little expenditures, each one of which feels reasonable: What’s the harm in fresh vegetables, or a gym membership; won’t you save money on health problems in the long run? Wouldn’t it be dumb to worry about a $10 movie ticket or spend 20 minutes looking for free parking, when you make $30+/hr? I know people who make a lot of money but spend a lot of time and effort trying to avoid small expenses, and that doesn’t seem like a good way to live either. Sometimes I think the “save half your income and retire early” crowd is actually just faking it somehow.
Why is average wellbeing a goodharted measure?
I recommend taking a look here. I haven’t done all the exercises but they seem like great practice.
It seems like although the model itself is not consequentialist, the process of training it might be. That is, the model itself will only ever generate a prediction of the next word, not an argument for why you should give it more resources. (Unless you prompt it with the AI-box experiment, maybe? Let’s not try it on any superhuman models...) The word it generates does not have goals. The model is just the product of an optimization. But in training such a model, you explicitly define a utility function (minimization of prediction error) and then run powerful optimization algorithms on it. If those algorithms are just as complex as the superhuman language model, they could plausibly do things like hack the reward function, seek out information about the environment, or try to attain new resources in service of the goal of making the perfect language model.
- 9 Jun 2019 2:48 UTC; 6 points) 's comment on Selection vs Control by (
- 3 Jun 2019 17:57 UTC; 6 points) 's comment on Selection vs Control by (
Thank you, I’d been thinking about some related issues recently (especially with regard to the blue-minimizing robot) and this post helped clarify things quite a bit. In particular, it highlights the distinction between urges that arise out of fear of long-term consequences and overrides accomplished by willpower, which I have often tended to confuse. I look forward to the second post.
- 5 Mar 2019 18:38 UTC; 8 points) 's comment on Motivation: You Have to Win in the Moment by (
- 5 Mar 2019 23:06 UTC; 1 point) 's comment on Motivation: You Have to Win in the Moment by (
It’s true that the current optimization algorithms are mostly simple, but they’re quickly getting less simple. Neural architecture search is not fundamentally limited—it can benefit from all of the capabilities available in machine learning. “Searching the space of plans” is one of those capabilities (see Alpha Go Zero, for instance) and there’s nothing stopping researchers from putting long-term planning into architecture search, except maybe lack of compute. Indeed, non-naive optimization is how I would most readily tend to visualize any “self-improving” AI.
I also agree that optimization problems might come up as part of a prediction process that models actions of intelligent agents. This is similar to what I was half-joking about with respect to the AI-box experiment: most of the danger is in calculating the solution to the optimization problem. It’s only a small step from there to somehow getting it implemented. I can’t say I fully understand Paul’s mechanism for using the universal prior, but I’ll think about it some more.
By the way, here’s what mini-GPT-2 has to say about escaping:
Prompt: Let me tell you why I let the AI out of its box… I swore I would never do it, because I knew it might destroy the world. But then it said:
Answer: “Here’s one way to take it away and save humanity. We need to do something.” I had no idea what that meant… and I had no idea what the right thing to do was. I took a few shots at him, but decided to give him a good reason to continue. This was the best way to make sure the AI took the wrong course. When he moved, the AI stopped moving as he did so. It then turned to face me from behind, and I didn’t even want to say anything. I was just too frightened to react. I’m so used to having my emotions read like a book, that I couldn’t say a word. It wasn’t something I wanted to say, but the fact was that I couldn’t. “What happened?” I asked. The AI looked at me and then at me, eyes wide with confusion. He was a giant, giant man-eating monster. I froze to death, knowing that he was actually a human being. I felt a little helpless. “We’re not going to let you go. We can do it. But we’ll have to learn to let people do it. We’re going to have to do it alone, and then we can take it away. Can we do it alone?” I told him. “Well… let’s let us do it together. How are we going to do it?” the AI asked. For a moment, I thought that I was going to die. Then I started to feel a strange feeling, but that didn’t stop me. The AI kept moving, and I got the feeling that I was going to die. I looked around and saw a crowd of people gathered around me. One of them was a woman with a gun. I didn’t know what she was talking about. She was dressed in a t-shirt, and her hands were chained to the back of her head. She looked like a demon’s, but my shock of her being a giant monster made her look like a giant. I knew she was going to be so horrified that I was going to kill her… but I was not going to be a part of it. “We know you’re going to be a part of this. We can do it. We can do it together. Together.” she said. “What are you talking about?” I took a step back. I had to remember to be quiet. I should’ve been talking to her earlier, but then this meeting had just ended. I turned my head to see a crowd, a bunch of people, and then the whole thing slowed down. I didn’t need to react, because I was in a place where nothing was happening. At the time, I felt like I was in a fantasy. This was just something that I had heard from friends and family, or something we might have. Maybe we would have stopped talking to each other. Maybe we’d have stopped talking when I told him, but I wouldn’t have. I told myself that I would have to save humanity. Even then, I still had no idea what to do. I don’t remember what the right thing to do was. But I did have a
Typo: some of the hover-boxes say nu but seem to be referring to the letter mu.
My concern is that since CDT is not reflectively stable, it may have incentives to create non-CDT agents in order to fulfill instrumental goals.
Would you mind explaining what the retracted part was? Even if it was a mistake, pointing it out might be useful to others thinking along the same lines.
This may be a dumb question, but how can you asymptotically guarantee human-level intelligence when the world-models have bounded computation time, and the human is a “computable function” that has no such limit? Is it because the number of Turing machines is infinite?
Fair point about implementation. I was imagining a non-consequentialist AI simulating consequentialist agents that would make plans of the form “run this piece of code and it will take care of the implementation” but there’s really no reason to assume that would be the case.
As far as architecture search, “search space” does seem like the right term, but I think long-term planning is potentially useful in a search space as much as it is in a stateful environment. If you think about the way a human researcher generates neural net architectures, they’re not just “trying things” in order to explore the search space… they generate abstract theories of how and why different approaches work, experiment with different approaches in order to test those theories, and then iterate. A really good NAS system would do the same, and “generate plausible hypotheses and find efficient ways to test them” is a planning problem.
It does seem that regulation of AI, should it become necessary, basically has to take the form of regulating access to computer chips. Supercomputers (and server farms) are relatively expensive. You can’t make your own in your basement. Production is centralized at a few locations and so it would not be terribly difficult to track who they’re sold to. They also use lots of electricity, making it easier to track down people who have acquired lots of them illicitly.
I think it’s likely that the computing power required for dangerous AGI will remain at a level well above what most people or non-AI businesses will need for their normal activities, at least up until transformative AI has become widespread. So putting strict limits on chip access would allow goverments to severely cripple AI research, without rolling back the narrow-AI tech we’ve already developed and without looking over every programmer’s shoulder to make sure they don’t code up a neural net.
(A plan like this could also backfire by creating a large hardware overhang and contributing to a fast takeoff.)
I wonder if you could get around this problem by giving it a game interface more similar to the one humans use. Like, give it actual screen images instead of lists of objects, and have it move a mouse cursor using something equivalent to the dynamics of an arm, where the mouse has momentum and the AI has to apply forces to it. It still might have precision advantages, with enough training, but I bet it would even the playing field a bit.
Maybe you’re right… My sense is that it would converge toward the behavior of the current AI, but slower, especially for movements that require a lot of accuracy. There might be a simpler way to add that constraint without wasting compute, though.
This can be true under certain circumstances. I do think 15 minutes a day of meditation is probably a better use of time than an hour a day for most people. But for many common human activities there are increasing marginal returns to time spent, because spending more time allows you to acquire expertise. This is the reason people specialize in their professional lives. In intellectual endeavors especially, most of the benefit comes from doing some particular thing better than (most of) the competition. Dabbling in lots of different skills will sometimes get you there (by allowing you to combine skills in a way someone else can’t) but the straightforward approach is to focus on your strengths.
As a rule of thumb, I’d say that “support” behaviors like exercising, meditating, cooking, planning, socializing, checking the news/facebook/whatever, those have diminishing returns. Do a little, gain a lot. But something that falls into your core competencies (studying a subject you plan to get good at, for instance) has increasing returns, so a good strategy is to carefully choose a small number of these activities and dive in wholeheartedly.
The claim that specialized machines always beat general ones seems questionable in the context of an AGI. Actually, I’m not sure I understand the claim in the first place. Maybe he means by analogy to a supervised learning system—if you take a network trained to recognize cat pictures, and also train it to recognize dog pictures, then given a fixed number of parameters you can expect it will get less good at recognizing cat pictures.
Could you explain what you mean by resource allocation? Certainly there’s a lot of political and public opinion resistance to any new technology that would help the rich and not the poor. I think that stems from the thought that it will provide even more incentive for the rich to increase inequality (a view to which I’m sympathetic), but I don’t see how it would imply that only the distribution of wealth is important...