[Mod note: This comment was previously deleted by mods, and on reflection returned.]
From my perspective, Eliezer comes across as the AI safety equivalent of a militant vegan or smug atheist in this post. I’m not aware of any social science on the topic of whether people like this tend to be useful to their cause or not, but my personal impression has always been that they aren’t. Even though I agree with his core thesis, I think posts like this plausibly make it harder for someone like me to have conversations with AI people about safety.
The post leans hard on the idea that security mindset is something innate you either have or don’t have, which, as I complained previously, is not well-supported. Plenty of sneering towards people who are assumed to lack it is unhelpful.
The post also leans hard on the password requirements example which I critiqued previously here. This feels like an example of the Copenhagen Interpretation of Ethics. Some companies take a basic step to help old people choose better passwords, and they get sneered at… meanwhile the many companies that do nothing to help users choose better passwords get off scot-free. Here, Coral reminds me of a militant vegan who specifically targets companies that engage in humane slaughter practices.
The analogy itself is weak. Paypal is an example of a company that became successful largely due to its ability to combat fraud, despite having no idea that fraud is something it would have to deal with in the beginning. (You can read the book Founders at Work to learn more about this—Paypal’s chapter is the very first one. Question for Eliezer: After reading the chapter, can you tell me whether you think Max Levchin is someone who has security mindset? Again, I would argue that the evidence for Eliezer’s view of security mindset is very shakey. If I had to bet on either (a) a developer with “ordinary paranoia”, using tools they are very familiar with, who is given security as a design goal, vs (b) a developer with “security mindset”, using tools they aren’t familiar with, who isn’t given security as a design goal, I’d bet on (a). More broadly, Eliezer’s use of “security mindset” looks kinda like an attempt to sneak in the connotation that anyone who doesn’t realize security was a design goal is incapable of writing secure software. Note: It’s often not rational for software companies to have security as a design goal.) And Paul Graham writes (granted, this was 2005, so it’s possible his opinion has changed in the intervening 12 years):
...we advise groups to ignore issues like scalability, internationalization, and heavy-duty security at first. [1] I can imagine an advocate of “best practices” saying these ought to be considered from the start. And he’d be right, except that they interfere with the primary function of software in a startup: to be a vehicle for experimenting with its own design. Having to retrofit internationalization or scalability is a pain, certainly. The only bigger pain is not needing to, because your initial version was too big and rigid to evolve into something users wanted.
To translate Graham’s statement back to the FAI problem: In Eliezer’s alignment talk, he discusses the value of solving a relaxed constraint version of the FAI problem by granting oneself unlimited computing power. Well, in the same way, the AGI problem can be seen as a relaxed constraint version of the FAI problem. One could argue that it’s a waste of time to try to make a secure version of AGI Approach X if we don’t even know if it’s possible to build an AGI using AGI Approach X. (I don’t agree with this view, but I don’t think it’s entirely unreasonable.)
As far as I can tell, Coral’s OpenBSD analogy is flat out wrong. Coral doesn’t appear to be familiar with the concept of a “trusted computing base” (see these lecture notes that I linked from the comments of his previous post). The most exciting project I know of in the OS security space is Qubes, which, in a certain sense, is doing exactly what Coral says can’t be done. This is a decent blog post on the philosophy behind Qubes.
Again, translating this info back to the FAI problem: Andrew Wiles once said:
Perhaps I could best describe my experience of doing mathematics in terms of entering a dark mansion. One goes into the first room, and it’s dark, completely dark. One stumbles around bumping into the furniture, and gradually, you learn where each piece of furniture is, and finally, after six months or so, you find the light switch. You turn it on, and suddenly, it’s all illuminated. You can see exactly where you were.
My interpretation of this statement is that the key to solving difficult problems is to find a way to frame them that makes them seem easy. As someone who works for an AI safety nonprofit, Eliezer has a financial interest in making AI safety seem as difficult as possible. Unfortunately, while that is probably a great strategy for gathering donations, it might not be a good strategy for actually solving the problem. The Qubes project is interesting because someone thought of a way to reframe the OS security problem to make it much more tractable. (Instead of needing a 100% correct OS, now to a first approximation you need a 100% correct hypervisor.) I’m not necessarily saying MIRI is overfunded. But I think MIRI researchers, when they are trying to actually make progress on FAI, need to recognize and push back against institutional incentives to frame FAI in a way that makes it seem as hard as possible to solve. Eliezer’s admonition to not “try to solve the whole problem at once” seems like the wrong thing to say, from my perspective.
Frankly, insofar as security mindset is a thing, this post comes across to me as being written by someone who doesn’t have it. I don’t get the sense that the author has even “ordinary paranoia” about the possibility that a post like this could be harmful, despite the fact Nick Bostrom’s non-confrontational advocacy approach seems to have done a lot more to expand the AI safety Overton window, and despite the possibility that a post like this might increase already-existing politicization of AI safety. (I’m not even sure Eliezer has a solid story for how this post could end up being helpful!)
Similarly, when I think of the people I would trust the most to write secure software, Coral’s 1,000,000:1 odds estimate does not seem like the kind of thing they would say—I’d sooner trust someone who is much more self-skeptical and spends a lot more time accounting for model uncertainty. (Model uncertainty and self-skepticism are both big parts of security mindset!) Does Coral think she can make a million predictions like this and be incorrect on only one of them? How much time did Coral spend looking in the dark for stories like Max Levchin’s which don’t fit their models?
(An even more straightforward argument for why Eliezer lacks security mindset: Eliezer says security mindsest is an innate characteristic, which means it’s present from birth. SIAI was founded as an organization to develop seed AI without regard for safety. The fact that Eliezer didn’t instantly realize the importance of friendliness when presented with the notion of seed AI means he lacks security mindset, and since security mindset is present from birth, he doesn’t have it now either.)
The fact that I consider myself a non-expert on both software security and AI, yet I’m able to come up with these obvious-seeming counterarguments, does not bode well. (Note: I keep harping on my non-expert status because I’m afraid someone will take my comments as literal truth, but I’m acutely aware that I’m just sharing facts I remember reading and ideas I’ve randomly had. I want people to feel comfortable giving me pushback if they have better information, the way people don’t seem to do with Eliezer. In fact, this is the main reason why I write so many comments and so few toplevel posts—people seem more willing to accept toplevel posts as settled fact, even if the research behind them is actually pretty poor. How’s that for security mindset? :P)
As a side note, I read & agreed with Eliezer’s Inadequate Equilibria book. In fact, I’m planning to buy it as a Christmas present for a friend. So if there’s a deeper disagreement here, that’s not it. A quick shot at identifying the deeper disagreement: From my perspective, Eliezer frequently seems to fall prey to the “What You See Is All There Is” phenomenon that Daniel Kahneman describes in Thinking Fast and Slow. For example, in this dialogue, the protagonist says that for a secure operating system, “Everything exposed to an attacker, and everything those subsystems interact with, and everything those parts interact with! You have to build all of it robustly!” But this doesn’t appear to actually be true (see Qubes). Just because Eliezer is unable see a way to do it, doesn’t mean it can’t be done.
The post leans hard on the idea that security mindset is something innate you either have or don’t have, which, as I complained previously, is not well-supported.
Agreed Eliezer hasn’t given much evidence for the claim that security mindset is often untrainable; it’s good that you’re flagging this explicitly. I think his goal was to promote the “not just anyone can be trained to think like Bruce Schneier” hypothesis to readers’ attention, and to say what his own current model is, not to defend the model in any detail.
I found the change in focus useful myself because Inadequate Equilibria talks so little about innate competence and individual skill gaps, even though it’s clearly one of the central puzzle pieces.
This feels like an example of the Copenhagen Interpretation of Ethics. Some companies take a basic step to help old people choose better passwords, and they get sneered at… meanwhile the many companies that do nothing to help users choose better passwords get off scot-free.
It might not be fair, but I think that’s fine here. The important question is whether the world is adequate in certain respects (e.g., at converting resources into some level of security and privacy for the average user), and what that implies in domains like AI that we care about. I don’t expect companies with inadequate password systems to suffer any great harm from a blog post criticizing them without spending an equal number of paragraphs criticizing organizations with even worse security practices. The most material question is whether password standards are in fact trivial to improve on in a way that makes users and companies much better off; it’s not clear to me whether we disagree about that, since it might be that you’re just focusing on a lower adequacy threshold.
The Qubes project is interesting because someone thought of a way to reframe the OS security problem to make it much more tractable. (Instead of needing a 100% correct OS, now to a first approximation you need a 100% correct hypervisor.)
I don’t know much about Qubes, but the idea of modularizing the problem, distinguishing trusted and untrusted system components, minimizing reliance on less trusted components, and looking for work-arounds to make things as easy as possible (without assuming they’re easy), sounds like ordinary MIRI research practice. Eliezer’s idea of corrigibility is an example of this approach, and Eliezer’s said that if alignment turns out to be surprisingly easy, one of the likeliest paths is if there turns out to be a good-enough concept of corrigibility that’s easy to train into systems.
Eliezer’s admonition to not “try to solve the whole problem at once” seems like the wrong thing to say, from my perspective.
Ambitious efforts to take a huge chunk out of the problem, or to find some hacky or elegant way to route around a central difficulty, seem good to me. I haven’t seen people make much progress if they “try to solve the whole problem at once” with a few minutes/hours of experience thinking through the problem rather than a few months/years; usually that looks less like corrigibility or ALBA and more like “well we’ll shut it off if it starts doing scary stuff” or “well we’ll just raise it like a human child”.
I don’t get the sense that the author has even “ordinary paranoia” about the possibility that a post like this could be harmful
It sounds like you’re making a prediction here that Eliezer and others didn’t put much thought in advance into the potential risks or costs of this post. Having talked with Eliezer and others about the post beforehand, I can confirm that this prediction is false.
The fact that I consider myself a non-expert on both software security and AI, yet I’m able to come up with these obvious-seeming counterarguments, does not bode well.
I think you’re overestimating how much overlap there is between what different people tend to think are the most obvious counterarguments to this or that AGI alignment argument. This is actually a hard problem. If Alice thinks counter-arguments A and B are obvious, Bob thinks counter-arguments B and C are obvious, and Carol thinks counter-arguments C and D are obvious, and you only have time to cover two counter-arguments before the post gets overly long, then no matter which arguments you choose you’ll end up with most readers thinking that you’ve neglected one or more “obvious” counter-arguments.
At the same time, if you try to address as many counter-arguments as you can given length constraints, you’ll inevitably end up with most readers feeling baffled at why you’re wasting time on trivial or straw counter-arguments that they don’t care about.
This is also made more difficult if you have to reply to all the counter-arguments Alice disagrees with but thinks someone else might agree with: Alice might be wrong about who is (or should be) in the target audience, she might be wrong about the beliefs of this or that potential target audience, or she might just have an impractically long list of counter-arguments to cover (due to not restricting herself to what she thinks is true or even all that probable). I think that group discussions often end up going in unproductive directions when hypothetical disagreements take the place of actual disagreements.
The most material question is whether password standards are in fact trivial to improve on in a way that makes users and companies much better off; it’s not clear to me whether we disagree about that, since it might be that you’re just focusing on a lower adequacy threshold.
If there’s a trivial way to measure password strength, the method has not occurred to me. Suppose my password generation algorithm randomly samples my password from the set of all alphanumeric strings that are between 6 and 20 characters long. That’s 715971350555965203672729120482208448 possible passwords I’m choosing from. Sounds pretty secure right? Well, two of those alphanumeric strings between 6 and 20 characters are “aaaaaaaaaa” and “password123″. A server that just sees “aaaaaaaaaa” as my password has no way a priori to know what algorithm I used to generate it.
I don’t expect it is worth the time of your average company to write a specialized module that attempts to reverse-engineer a user’s password in order to determine the algorithm that was likely used to generate it. I expect most companies who attempt to measure password strength this way do so using 3rd party libraries, not algorithms that have been developed in-house. The difficulty of doing this depends on whether there’s a good 3rd party library available for the platform the company is using, and how quickly an engineer can verify that the library isn’t doing anything suspicious with the passwords it analyzes. This article has more info about the difficulty of measuring password strength—it looks like most 3rd party libraries aren’t very good at it.
But anyway, as I said, it typically isn’t rational for software companies to invest a lot in software security. If we are trying to approximate a function that takes a company’s financial interest in security as an input, and outputs the degree to which a company’s systems are secure, then the password example gives us a data point where the company’s financial interest is low and the security of their system is also low. Coral argues (correctly IMO) that Merchant Drones Inc. has a strong financial incentive to prevent people from swindling their drones. Extrapolating from the password guessing example the way she does makes the assumption that function mapping financial interest to security is a constant function. I don’t think that’s a reasonable assumption.
I haven’t seen people make much progress if they “try to solve the whole problem at once” with a few minutes/hours of experience thinking through the problem rather than a few months/years; usually that looks less like corrigibility or ALBA and more like “well we’ll shut it off if it starts doing scary stuff” or “well we’ll just raise it like a human child”.
The reason I’m complaining about this is because I sometimes try to have conversations with people in the community about ideas I have related to AI alignment, typically ideas that I can get across the gist of in <5 minutes but aren’t nearly as naive as “we’ll shut it off if it starts doing scary stuff” or “we’ll raise it like a human child”, but I have a hard time getting people to engage seriously. My diagnosis is that people in the community have some kind of learned helplessness around AI safety, believing it to be a big serious problem that only big serious people are allowed to think about it. Trying to make progress on AI alignment with any idea that can be explained in <5 minutes marks one as uncool and naive. Even worse, in some cases I think people get defensive about the idea that it might be possible to make progress on AI alignment in a 10-minute conversation—the community has a lot invested in the idea that AI alignment is a super difficult problem, and we’d all look like fools if it was possible to make meaningful progress in 10 minutes. I’m reminded of this quote from Richard Hamming:
...if you do some good work you will find yourself on all kinds of committees and unable to do any more work. You may find yourself as I saw Brattain when he got a Nobel Prize. The day the prize was announced we all assembled in Arnold Auditorium; all three winners got up and made speeches. The third one, Brattain, practically with tears in his eyes, said, ``I know about this Nobel-Prize effect and I am not going to let it affect me; I am going to remain good old Walter Brattain.″ Well I said to myself, ``That is nice.″ But in a few weeks I saw it was affecting him. Now he could only work on great problems.
When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn’t the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you. In fact I will give you my favorite quotation of many years. The Institute for Advanced Study in Princeton, in my opinion, has ruined more good scientists than any institution has created, judged by what they did before they came and judged by what they did after. Not that they weren’t good afterwards, but they were superb before they got there and were only good afterwards.
See also—if Richard Feynman was working on AI alignment, it sounds to me as though he’d see the naive suggestions of noobs as a source of occasional ideas rather than something to be ridiculed.
It sounds like you’re making a prediction here that Eliezer and others didn’t put much thought in advance into the potential risks or costs of this post. Having talked with Eliezer and others about the post beforehand, I can confirm that this prediction is false.
That’s good to hear.
The best way to deal with possible counterarguments is to rethink your arguments so they’re no longer vulnerable to them. (Example: Eliezer could have had Coral say something like “Why don’t companies just use this free and widely praised password strength measurement library on Github?” or “Why is there no good open source library to measure password strength?” instead of “Why don’t companies just measure entropy?” Random note: Insofar as the numbers and symbols thing is not just security theater, I’d guess it mainly makes it harder for friends/relatives to correctly guess that you used your dog’s name as your password, in order to decrease the volume of password-related support requests.) I’ll confess, when someone writes an article that seems to me like it’s written in an insufferably smug tone, yet I don’t get the sense that they’ve considered counterarguments that seem strong and obvious to me, that really rubs me the wrong way.
(Upvoted but disagree with the conclusions and about a quarter of the assumptions. I should really get around to design functionality that allows users to more easily make it clear that they both want to reward someone for writing something, and that they still disagree with it)
Want to make a bet about whether Eliezer will reply to my comment? I’m betting he won’t, despite writing about the importance of this, because I think Eliezer still has the issue Holden identified years ago of being too selective in whose feedback to take. My guess is that he will read over it, figure out some reason why he thinks I’m wrong, and then not engage with me because he hasn’t internalized the WYSIATI phenomenon, and he underrates the possibility that I might have some counterargument to his counterargument that he hasn’t thought of.
My model of Eliezer here (which has a high chance of being inaccurate) is that he is bottlenecked on mental energy, and in general has a lot of people who try to engage with him on the level of depth that your comment goes into, all of whom will probably have further counterarguments and discussions of their original posts. (All significant communication is not simple true/false arguments, but detailed model sharing.) As such he exerts high selection pressure on where to spend his mental energy.
I’ve downvoted this, because I think it is creating a very dangerous pressure on people stating their opinions openly in this community.
As someone who has been active in EA and rationality community building for a few years now, I have repeatedly experienced the pressure of people demanding explanations of me and the people working with me, so much that at peak times around EA Global responding to those demands took up more than 70% of my time. In retrospect, responding to each argument and comment individually was a bad use of my time, and everyone would have been better served by me taking a step back, and not responding to each comment in particular. And then to instead keep a tally of what confusions or misunderstandings people frequently seemed to have, and eventually write up a more educational post that had some real effort put into it, that tried to explain my perspective on a deeper level.
I see Eliezer mostly doing exactly that, and want him to continue doing that. I don’t think it’s a good use of his time to respond to every comment, especially if he has good reason to expect that it will cost him significant willpower to do so.
Even more so, people demanding explanations or demanding engagement has in my experience reliably lead to exhausting conversations full of defensiveness and hedging, and so I expect comments like this to significantly reduce the probability that people whose time is valuable will engage with the comments on this site.
As someone with time that is relatively valueless compared to Elizer’s and Oliver’s, I’d like to second this comment. As much as I’d love to respond to every person who has a criticism of me, it would take up a lot of mental energy that I’d rather use for writing. That doesn’t mean that I don’t read criticisms and take them to heart.
For me, writing for one specific person, who is almost guaranteed to read what I wrote, is also stimulating. When writing an article, I often feel like talking to an empty room. As a result, I don’t write many articles.
Still, writing articles would probably be a better use of my time, because I often find myself repeating the same things in different 1:1 interactions. (Or perhaps to collect old comments on the same topic, and rewrite them as an article afterwards.) I just haven’t found a way to align my emotions with this.
I guess I wanted to say that “articles > comments” regardless of one’s emotions (assuming one can write good articles). Unless the comment can be made short, or responds to something new. But “new” is impossible to judge from outside; we don’t know what kinds of questions Eliezer gets repeatedly e.g. outside LW.
I can’t work out where you’re going with the Qubes thing. Obviously a secure hypervisor wouldn’t imply a secure system, any more than a secure kernel implies a secure system in a non-hypervisor based system.
More deeply, you seem to imply that someone who has made a security error obviously lacks the security mindset. If only the mindset protected us from all errors; sadly it’s not so. But I’ve often been in the situation of trying to explain something security-related to a smart person, and sensing the gap that seemed wider than a mere lack of knowledge.
The point I’m trying to make is that this statement
Everything exposed to an attacker, and everything those subsystems interact with, and everything those parts interact with! You have to build all of it robustly!
seems false to me, if you have good isolation—which is what a project like Qubes tries to accomplish. Kernel vs hypervisor is discussed in this blog post. It’s possible I’m describing Qubes incorrectly; I’m not a systems expert. But I feel pretty confident in the broader point about trusted computing bases.
More deeply, you seem to imply that someone who has made a security error obviously lacks the security mindset. If only the mindset protected us from all errors; sadly it’s not so. But I’ve often been in the situation of trying to explain something security-related to a smart person, and sensing the gap that seemed wider than a mere lack of knowledge.
This was the implication I was getting from Eliezer. I attempted a reductio ad absurdum.
Everything exposed to an attacker, and everything those subsystems interact with, and everything those parts interact with! You have to build all of it robustly!
seems false to me, if you have good isolation—which is what a project like Qubes tries to accomplish.
I agree with you here that Qubes is cool; but the fact that it is (performantly) possible was not obvious before it was cooked up. I certainly failed to come up with the idea of Qubes before hearing it (even after bluepill), and I am not ashamed of this: Qubes is brilliant (and IOMMU is cheating).
Also, in some sense Qubes is doing exactly what Carol says. Qubes only has a chance of working because the fundamental design for hardware-assisted security-by-isolation trumps all other considerations in their trade-offs. The UI is fundamentally constrained (to prevent window-redressing), as is performance (3d accelleration) and ease-of-use. All these constraints were known and documented before even a single line of code was written (afaik). Qubes can only work because it has security as one of its main goals, and has brilliant security people as project leads with infinite internal political capital.
That said, going on a tangent about qubes:
I really want to see painless live-migration of Qubes (migrate an application between different hosts, without interupting—say, from a lousy netbook to a fat workstation and back), this would be a killer feature for non-security-nerds. Unfortunately xen cannot do x86 <-> arm (qemu?); live-migration smartphone<->workstation would be awesome (just bring my smartphone, plug it in as a boot-drive and continue your work on a fat machine—secure as long as there is no hardware implant).
Re Qubes security: You still have the bad problem of timing-sidechannels which cross VM borders; you should view Qubes as an awesome mitigation, not a solution (not to speak of the not-so-rare xen outbreaks), and you still need to secure your software. That is, Qubes attempts to prevent privelege escalation, not code exec; if the vulnerability is in the application which handles your valuable data, then Qubes cannot help you.
Also, in some sense Qubes is doing exactly what Carol says. Qubes only has a chance of working because the fundamental design for hardware-assisted security-by-isolation trumps all other considerations in their trade-offs. The UI is fundamentally constrained (to prevent window-redressing), as is performance (3d accelleration) and ease-of-use. All these constraints were known and documented before even a single line of code was written (afaik). Qubes can only work because it has security as one of its main goals, and has brilliant security people as project leads with infinite internal political capital.
It sounds like you’re saying Qubes is a good illustration of Coral’s claim that really secure software needs security as a design goal from the beginning and security DNA in the project leadership. I agree with that claim.
I never expected it to become as secure as it did. And Apple security are clowns (institutionally, no offense inteded for the good people working there), and UI tends to beat security in tradeoffs.
To translate Graham’s statement back to the FAI problem: In Eliezer’s alignment talk, he discusses the value of solving a relaxed constraint version of the FAI problem by granting oneself unlimited computing power. Well, in the same way, the AGI problem can be seen as a relaxed constraint version of the FAI problem. One could argue that it’s a waste of time to try to make a secure version of AGI Approach X if we don’t even know if it’s possible to build an AGI using AGI Approach X. (I don’t agree with this view, but I don’t think it’s entirely unreasonable.)
Isn’t the point exactly that if you can’t solve the whole problem of (AGI + Alignment) then it would be better not even to try solving the relaxed problem (AGI)?
Maybe not, if you can keep your solution to AGI secret and suppress it if it turns out that there’s no way to solve the alignment problem in your framework.
[Mod note: This comment was previously deleted by mods, and on reflection returned.]
From my perspective, Eliezer comes across as the AI safety equivalent of a militant vegan or smug atheist in this post. I’m not aware of any social science on the topic of whether people like this tend to be useful to their cause or not, but my personal impression has always been that they aren’t. Even though I agree with his core thesis, I think posts like this plausibly make it harder for someone like me to have conversations with AI people about safety.
The post leans hard on the idea that security mindset is something innate you either have or don’t have, which, as I complained previously, is not well-supported. Plenty of sneering towards people who are assumed to lack it is unhelpful.
The post also leans hard on the password requirements example which I critiqued previously here. This feels like an example of the Copenhagen Interpretation of Ethics. Some companies take a basic step to help old people choose better passwords, and they get sneered at… meanwhile the many companies that do nothing to help users choose better passwords get off scot-free. Here, Coral reminds me of a militant vegan who specifically targets companies that engage in humane slaughter practices.
The analogy itself is weak. Paypal is an example of a company that became successful largely due to its ability to combat fraud, despite having no idea that fraud is something it would have to deal with in the beginning. (You can read the book Founders at Work to learn more about this—Paypal’s chapter is the very first one. Question for Eliezer: After reading the chapter, can you tell me whether you think Max Levchin is someone who has security mindset? Again, I would argue that the evidence for Eliezer’s view of security mindset is very shakey. If I had to bet on either (a) a developer with “ordinary paranoia”, using tools they are very familiar with, who is given security as a design goal, vs (b) a developer with “security mindset”, using tools they aren’t familiar with, who isn’t given security as a design goal, I’d bet on (a). More broadly, Eliezer’s use of “security mindset” looks kinda like an attempt to sneak in the connotation that anyone who doesn’t realize security was a design goal is incapable of writing secure software. Note: It’s often not rational for software companies to have security as a design goal.) And Paul Graham writes (granted, this was 2005, so it’s possible his opinion has changed in the intervening 12 years):
To translate Graham’s statement back to the FAI problem: In Eliezer’s alignment talk, he discusses the value of solving a relaxed constraint version of the FAI problem by granting oneself unlimited computing power. Well, in the same way, the AGI problem can be seen as a relaxed constraint version of the FAI problem. One could argue that it’s a waste of time to try to make a secure version of AGI Approach X if we don’t even know if it’s possible to build an AGI using AGI Approach X. (I don’t agree with this view, but I don’t think it’s entirely unreasonable.)
As far as I can tell, Coral’s OpenBSD analogy is flat out wrong. Coral doesn’t appear to be familiar with the concept of a “trusted computing base” (see these lecture notes that I linked from the comments of his previous post). The most exciting project I know of in the OS security space is Qubes, which, in a certain sense, is doing exactly what Coral says can’t be done. This is a decent blog post on the philosophy behind Qubes.
Again, translating this info back to the FAI problem: Andrew Wiles once said:
My interpretation of this statement is that the key to solving difficult problems is to find a way to frame them that makes them seem easy. As someone who works for an AI safety nonprofit, Eliezer has a financial interest in making AI safety seem as difficult as possible. Unfortunately, while that is probably a great strategy for gathering donations, it might not be a good strategy for actually solving the problem. The Qubes project is interesting because someone thought of a way to reframe the OS security problem to make it much more tractable. (Instead of needing a 100% correct OS, now to a first approximation you need a 100% correct hypervisor.) I’m not necessarily saying MIRI is overfunded. But I think MIRI researchers, when they are trying to actually make progress on FAI, need to recognize and push back against institutional incentives to frame FAI in a way that makes it seem as hard as possible to solve. Eliezer’s admonition to not “try to solve the whole problem at once” seems like the wrong thing to say, from my perspective.
Frankly, insofar as security mindset is a thing, this post comes across to me as being written by someone who doesn’t have it. I don’t get the sense that the author has even “ordinary paranoia” about the possibility that a post like this could be harmful, despite the fact Nick Bostrom’s non-confrontational advocacy approach seems to have done a lot more to expand the AI safety Overton window, and despite the possibility that a post like this might increase already-existing politicization of AI safety. (I’m not even sure Eliezer has a solid story for how this post could end up being helpful!)
Similarly, when I think of the people I would trust the most to write secure software, Coral’s 1,000,000:1 odds estimate does not seem like the kind of thing they would say—I’d sooner trust someone who is much more self-skeptical and spends a lot more time accounting for model uncertainty. (Model uncertainty and self-skepticism are both big parts of security mindset!) Does Coral think she can make a million predictions like this and be incorrect on only one of them? How much time did Coral spend looking in the dark for stories like Max Levchin’s which don’t fit their models?
(An even more straightforward argument for why Eliezer lacks security mindset: Eliezer says security mindsest is an innate characteristic, which means it’s present from birth. SIAI was founded as an organization to develop seed AI without regard for safety. The fact that Eliezer didn’t instantly realize the importance of friendliness when presented with the notion of seed AI means he lacks security mindset, and since security mindset is present from birth, he doesn’t have it now either.)
The fact that I consider myself a non-expert on both software security and AI, yet I’m able to come up with these obvious-seeming counterarguments, does not bode well. (Note: I keep harping on my non-expert status because I’m afraid someone will take my comments as literal truth, but I’m acutely aware that I’m just sharing facts I remember reading and ideas I’ve randomly had. I want people to feel comfortable giving me pushback if they have better information, the way people don’t seem to do with Eliezer. In fact, this is the main reason why I write so many comments and so few toplevel posts—people seem more willing to accept toplevel posts as settled fact, even if the research behind them is actually pretty poor. How’s that for security mindset? :P)
As a side note, I read & agreed with Eliezer’s Inadequate Equilibria book. In fact, I’m planning to buy it as a Christmas present for a friend. So if there’s a deeper disagreement here, that’s not it. A quick shot at identifying the deeper disagreement: From my perspective, Eliezer frequently seems to fall prey to the “What You See Is All There Is” phenomenon that Daniel Kahneman describes in Thinking Fast and Slow. For example, in this dialogue, the protagonist says that for a secure operating system, “Everything exposed to an attacker, and everything those subsystems interact with, and everything those parts interact with! You have to build all of it robustly!” But this doesn’t appear to actually be true (see Qubes). Just because Eliezer is unable see a way to do it, doesn’t mean it can’t be done.
Agreed Eliezer hasn’t given much evidence for the claim that security mindset is often untrainable; it’s good that you’re flagging this explicitly. I think his goal was to promote the “not just anyone can be trained to think like Bruce Schneier” hypothesis to readers’ attention, and to say what his own current model is, not to defend the model in any detail.
I found the change in focus useful myself because Inadequate Equilibria talks so little about innate competence and individual skill gaps, even though it’s clearly one of the central puzzle pieces.
It might not be fair, but I think that’s fine here. The important question is whether the world is adequate in certain respects (e.g., at converting resources into some level of security and privacy for the average user), and what that implies in domains like AI that we care about. I don’t expect companies with inadequate password systems to suffer any great harm from a blog post criticizing them without spending an equal number of paragraphs criticizing organizations with even worse security practices. The most material question is whether password standards are in fact trivial to improve on in a way that makes users and companies much better off; it’s not clear to me whether we disagree about that, since it might be that you’re just focusing on a lower adequacy threshold.
I don’t know much about Qubes, but the idea of modularizing the problem, distinguishing trusted and untrusted system components, minimizing reliance on less trusted components, and looking for work-arounds to make things as easy as possible (without assuming they’re easy), sounds like ordinary MIRI research practice. Eliezer’s idea of corrigibility is an example of this approach, and Eliezer’s said that if alignment turns out to be surprisingly easy, one of the likeliest paths is if there turns out to be a good-enough concept of corrigibility that’s easy to train into systems.
Ambitious efforts to take a huge chunk out of the problem, or to find some hacky or elegant way to route around a central difficulty, seem good to me. I haven’t seen people make much progress if they “try to solve the whole problem at once” with a few minutes/hours of experience thinking through the problem rather than a few months/years; usually that looks less like corrigibility or ALBA and more like “well we’ll shut it off if it starts doing scary stuff” or “well we’ll just raise it like a human child”.
It sounds like you’re making a prediction here that Eliezer and others didn’t put much thought in advance into the potential risks or costs of this post. Having talked with Eliezer and others about the post beforehand, I can confirm that this prediction is false.
I think you’re overestimating how much overlap there is between what different people tend to think are the most obvious counterarguments to this or that AGI alignment argument. This is actually a hard problem. If Alice thinks counter-arguments A and B are obvious, Bob thinks counter-arguments B and C are obvious, and Carol thinks counter-arguments C and D are obvious, and you only have time to cover two counter-arguments before the post gets overly long, then no matter which arguments you choose you’ll end up with most readers thinking that you’ve neglected one or more “obvious” counter-arguments.
At the same time, if you try to address as many counter-arguments as you can given length constraints, you’ll inevitably end up with most readers feeling baffled at why you’re wasting time on trivial or straw counter-arguments that they don’t care about.
This is also made more difficult if you have to reply to all the counter-arguments Alice disagrees with but thinks someone else might agree with: Alice might be wrong about who is (or should be) in the target audience, she might be wrong about the beliefs of this or that potential target audience, or she might just have an impractically long list of counter-arguments to cover (due to not restricting herself to what she thinks is true or even all that probable). I think that group discussions often end up going in unproductive directions when hypothetical disagreements take the place of actual disagreements.
Thanks for the response!
If there’s a trivial way to measure password strength, the method has not occurred to me. Suppose my password generation algorithm randomly samples my password from the set of all alphanumeric strings that are between 6 and 20 characters long. That’s 715971350555965203672729120482208448 possible passwords I’m choosing from. Sounds pretty secure right? Well, two of those alphanumeric strings between 6 and 20 characters are “aaaaaaaaaa” and “password123″. A server that just sees “aaaaaaaaaa” as my password has no way a priori to know what algorithm I used to generate it.
I don’t expect it is worth the time of your average company to write a specialized module that attempts to reverse-engineer a user’s password in order to determine the algorithm that was likely used to generate it. I expect most companies who attempt to measure password strength this way do so using 3rd party libraries, not algorithms that have been developed in-house. The difficulty of doing this depends on whether there’s a good 3rd party library available for the platform the company is using, and how quickly an engineer can verify that the library isn’t doing anything suspicious with the passwords it analyzes. This article has more info about the difficulty of measuring password strength—it looks like most 3rd party libraries aren’t very good at it.
But anyway, as I said, it typically isn’t rational for software companies to invest a lot in software security. If we are trying to approximate a function that takes a company’s financial interest in security as an input, and outputs the degree to which a company’s systems are secure, then the password example gives us a data point where the company’s financial interest is low and the security of their system is also low. Coral argues (correctly IMO) that Merchant Drones Inc. has a strong financial incentive to prevent people from swindling their drones. Extrapolating from the password guessing example the way she does makes the assumption that function mapping financial interest to security is a constant function. I don’t think that’s a reasonable assumption.
The reason I’m complaining about this is because I sometimes try to have conversations with people in the community about ideas I have related to AI alignment, typically ideas that I can get across the gist of in <5 minutes but aren’t nearly as naive as “we’ll shut it off if it starts doing scary stuff” or “we’ll raise it like a human child”, but I have a hard time getting people to engage seriously. My diagnosis is that people in the community have some kind of learned helplessness around AI safety, believing it to be a big serious problem that only big serious people are allowed to think about it. Trying to make progress on AI alignment with any idea that can be explained in <5 minutes marks one as uncool and naive. Even worse, in some cases I think people get defensive about the idea that it might be possible to make progress on AI alignment in a 10-minute conversation—the community has a lot invested in the idea that AI alignment is a super difficult problem, and we’d all look like fools if it was possible to make meaningful progress in 10 minutes. I’m reminded of this quote from Richard Hamming:
See also—if Richard Feynman was working on AI alignment, it sounds to me as though he’d see the naive suggestions of noobs as a source of occasional ideas rather than something to be ridiculed.
That’s good to hear.
The best way to deal with possible counterarguments is to rethink your arguments so they’re no longer vulnerable to them. (Example: Eliezer could have had Coral say something like “Why don’t companies just use this free and widely praised password strength measurement library on Github?” or “Why is there no good open source library to measure password strength?” instead of “Why don’t companies just measure entropy?” Random note: Insofar as the numbers and symbols thing is not just security theater, I’d guess it mainly makes it harder for friends/relatives to correctly guess that you used your dog’s name as your password, in order to decrease the volume of password-related support requests.) I’ll confess, when someone writes an article that seems to me like it’s written in an insufferably smug tone, yet I don’t get the sense that they’ve considered counterarguments that seem strong and obvious to me, that really rubs me the wrong way.
(Upvoted but disagree with the conclusions and about a quarter of the assumptions. I should really get around to design functionality that allows users to more easily make it clear that they both want to reward someone for writing something, and that they still disagree with it)
Want to make a bet about whether Eliezer will reply to my comment? I’m betting he won’t, despite writing about the importance of this, because I think Eliezer still has the issue Holden identified years ago of being too selective in whose feedback to take. My guess is that he will read over it, figure out some reason why he thinks I’m wrong, and then not engage with me because he hasn’t internalized the WYSIATI phenomenon, and he underrates the possibility that I might have some counterargument to his counterargument that he hasn’t thought of.
My model of Eliezer here (which has a high chance of being inaccurate) is that he is bottlenecked on mental energy, and in general has a lot of people who try to engage with him on the level of depth that your comment goes into, all of whom will probably have further counterarguments and discussions of their original posts. (All significant communication is not simple true/false arguments, but detailed model sharing.) As such he exerts high selection pressure on where to spend his mental energy.
[deleted by moderator]
I’ve downvoted this, because I think it is creating a very dangerous pressure on people stating their opinions openly in this community.
As someone who has been active in EA and rationality community building for a few years now, I have repeatedly experienced the pressure of people demanding explanations of me and the people working with me, so much that at peak times around EA Global responding to those demands took up more than 70% of my time. In retrospect, responding to each argument and comment individually was a bad use of my time, and everyone would have been better served by me taking a step back, and not responding to each comment in particular. And then to instead keep a tally of what confusions or misunderstandings people frequently seemed to have, and eventually write up a more educational post that had some real effort put into it, that tried to explain my perspective on a deeper level.
I see Eliezer mostly doing exactly that, and want him to continue doing that. I don’t think it’s a good use of his time to respond to every comment, especially if he has good reason to expect that it will cost him significant willpower to do so.
Even more so, people demanding explanations or demanding engagement has in my experience reliably lead to exhausting conversations full of defensiveness and hedging, and so I expect comments like this to significantly reduce the probability that people whose time is valuable will engage with the comments on this site.
As someone with time that is relatively valueless compared to Elizer’s and Oliver’s, I’d like to second this comment. As much as I’d love to respond to every person who has a criticism of me, it would take up a lot of mental energy that I’d rather use for writing. That doesn’t mean that I don’t read criticisms and take them to heart.
That’s fair. I find it stimulating to engage with people, so this isn’t really something I empathize with.
For me, writing for one specific person, who is almost guaranteed to read what I wrote, is also stimulating. When writing an article, I often feel like talking to an empty room. As a result, I don’t write many articles.
Still, writing articles would probably be a better use of my time, because I often find myself repeating the same things in different 1:1 interactions. (Or perhaps to collect old comments on the same topic, and rewrite them as an article afterwards.) I just haven’t found a way to align my emotions with this.
I guess I wanted to say that “articles > comments” regardless of one’s emotions (assuming one can write good articles). Unless the comment can be made short, or responds to something new. But “new” is impossible to judge from outside; we don’t know what kinds of questions Eliezer gets repeatedly e.g. outside LW.
I can’t work out where you’re going with the Qubes thing. Obviously a secure hypervisor wouldn’t imply a secure system, any more than a secure kernel implies a secure system in a non-hypervisor based system.
More deeply, you seem to imply that someone who has made a security error obviously lacks the security mindset. If only the mindset protected us from all errors; sadly it’s not so. But I’ve often been in the situation of trying to explain something security-related to a smart person, and sensing the gap that seemed wider than a mere lack of knowledge.
The point I’m trying to make is that this statement
seems false to me, if you have good isolation—which is what a project like Qubes tries to accomplish. Kernel vs hypervisor is discussed in this blog post. It’s possible I’m describing Qubes incorrectly; I’m not a systems expert. But I feel pretty confident in the broader point about trusted computing bases.
This was the implication I was getting from Eliezer. I attempted a reductio ad absurdum.
seems false to me, if you have good isolation—which is what a project like Qubes tries to accomplish.
I agree with you here that Qubes is cool; but the fact that it is (performantly) possible was not obvious before it was cooked up. I certainly failed to come up with the idea of Qubes before hearing it (even after bluepill), and I am not ashamed of this: Qubes is brilliant (and IOMMU is cheating).
Also, in some sense Qubes is doing exactly what Carol says. Qubes only has a chance of working because the fundamental design for hardware-assisted security-by-isolation trumps all other considerations in their trade-offs. The UI is fundamentally constrained (to prevent window-redressing), as is performance (3d accelleration) and ease-of-use. All these constraints were known and documented before even a single line of code was written (afaik). Qubes can only work because it has security as one of its main goals, and has brilliant security people as project leads with infinite internal political capital.
That said, going on a tangent about qubes:
I really want to see painless live-migration of Qubes (migrate an application between different hosts, without interupting—say, from a lousy netbook to a fat workstation and back), this would be a killer feature for non-security-nerds. Unfortunately xen cannot do x86 <-> arm (qemu?); live-migration
smartphone<->workstation would be awesome (just bring my smartphone, plug it in as a boot-drive and continue your work on a fat machine—secure as long as there is no hardware implant).
Re Qubes security: You still have the bad problem of timing-sidechannels which cross VM borders; you should view Qubes as an awesome mitigation, not a solution (not to speak of the not-so-rare xen outbreaks), and you still need to secure your software. That is, Qubes attempts to prevent privelege escalation, not code exec; if the vulnerability is in the application which handles your valuable data, then Qubes cannot help you.
It sounds like you’re saying Qubes is a good illustration of Coral’s claim that really secure software needs security as a design goal from the beginning and security DNA in the project leadership. I agree with that claim.
Yep. The counter-example would be Apple iOS.
I never expected it to become as secure as it did. And Apple security are clowns (institutionally, no offense inteded for the good people working there), and UI tends to beat security in tradeoffs.
Isn’t the point exactly that if you can’t solve the whole problem of (AGI + Alignment) then it would be better not even to try solving the relaxed problem (AGI)?
Maybe not, if you can keep your solution to AGI secret and suppress it if it turns out that there’s no way to solve the alignment problem in your framework.