The post leans hard on the idea that security mindset is something innate you either have or don’t have, which, as I complained previously, is not well-supported.
Agreed Eliezer hasn’t given much evidence for the claim that security mindset is often untrainable; it’s good that you’re flagging this explicitly. I think his goal was to promote the “not just anyone can be trained to think like Bruce Schneier” hypothesis to readers’ attention, and to say what his own current model is, not to defend the model in any detail.
I found the change in focus useful myself because Inadequate Equilibria talks so little about innate competence and individual skill gaps, even though it’s clearly one of the central puzzle pieces.
This feels like an example of the Copenhagen Interpretation of Ethics. Some companies take a basic step to help old people choose better passwords, and they get sneered at… meanwhile the many companies that do nothing to help users choose better passwords get off scot-free.
It might not be fair, but I think that’s fine here. The important question is whether the world is adequate in certain respects (e.g., at converting resources into some level of security and privacy for the average user), and what that implies in domains like AI that we care about. I don’t expect companies with inadequate password systems to suffer any great harm from a blog post criticizing them without spending an equal number of paragraphs criticizing organizations with even worse security practices. The most material question is whether password standards are in fact trivial to improve on in a way that makes users and companies much better off; it’s not clear to me whether we disagree about that, since it might be that you’re just focusing on a lower adequacy threshold.
The Qubes project is interesting because someone thought of a way to reframe the OS security problem to make it much more tractable. (Instead of needing a 100% correct OS, now to a first approximation you need a 100% correct hypervisor.)
I don’t know much about Qubes, but the idea of modularizing the problem, distinguishing trusted and untrusted system components, minimizing reliance on less trusted components, and looking for work-arounds to make things as easy as possible (without assuming they’re easy), sounds like ordinary MIRI research practice. Eliezer’s idea of corrigibility is an example of this approach, and Eliezer’s said that if alignment turns out to be surprisingly easy, one of the likeliest paths is if there turns out to be a good-enough concept of corrigibility that’s easy to train into systems.
Eliezer’s admonition to not “try to solve the whole problem at once” seems like the wrong thing to say, from my perspective.
Ambitious efforts to take a huge chunk out of the problem, or to find some hacky or elegant way to route around a central difficulty, seem good to me. I haven’t seen people make much progress if they “try to solve the whole problem at once” with a few minutes/hours of experience thinking through the problem rather than a few months/years; usually that looks less like corrigibility or ALBA and more like “well we’ll shut it off if it starts doing scary stuff” or “well we’ll just raise it like a human child”.
I don’t get the sense that the author has even “ordinary paranoia” about the possibility that a post like this could be harmful
It sounds like you’re making a prediction here that Eliezer and others didn’t put much thought in advance into the potential risks or costs of this post. Having talked with Eliezer and others about the post beforehand, I can confirm that this prediction is false.
The fact that I consider myself a non-expert on both software security and AI, yet I’m able to come up with these obvious-seeming counterarguments, does not bode well.
I think you’re overestimating how much overlap there is between what different people tend to think are the most obvious counterarguments to this or that AGI alignment argument. This is actually a hard problem. If Alice thinks counter-arguments A and B are obvious, Bob thinks counter-arguments B and C are obvious, and Carol thinks counter-arguments C and D are obvious, and you only have time to cover two counter-arguments before the post gets overly long, then no matter which arguments you choose you’ll end up with most readers thinking that you’ve neglected one or more “obvious” counter-arguments.
At the same time, if you try to address as many counter-arguments as you can given length constraints, you’ll inevitably end up with most readers feeling baffled at why you’re wasting time on trivial or straw counter-arguments that they don’t care about.
This is also made more difficult if you have to reply to all the counter-arguments Alice disagrees with but thinks someone else might agree with: Alice might be wrong about who is (or should be) in the target audience, she might be wrong about the beliefs of this or that potential target audience, or she might just have an impractically long list of counter-arguments to cover (due to not restricting herself to what she thinks is true or even all that probable). I think that group discussions often end up going in unproductive directions when hypothetical disagreements take the place of actual disagreements.
The most material question is whether password standards are in fact trivial to improve on in a way that makes users and companies much better off; it’s not clear to me whether we disagree about that, since it might be that you’re just focusing on a lower adequacy threshold.
If there’s a trivial way to measure password strength, the method has not occurred to me. Suppose my password generation algorithm randomly samples my password from the set of all alphanumeric strings that are between 6 and 20 characters long. That’s 715971350555965203672729120482208448 possible passwords I’m choosing from. Sounds pretty secure right? Well, two of those alphanumeric strings between 6 and 20 characters are “aaaaaaaaaa” and “password123″. A server that just sees “aaaaaaaaaa” as my password has no way a priori to know what algorithm I used to generate it.
I don’t expect it is worth the time of your average company to write a specialized module that attempts to reverse-engineer a user’s password in order to determine the algorithm that was likely used to generate it. I expect most companies who attempt to measure password strength this way do so using 3rd party libraries, not algorithms that have been developed in-house. The difficulty of doing this depends on whether there’s a good 3rd party library available for the platform the company is using, and how quickly an engineer can verify that the library isn’t doing anything suspicious with the passwords it analyzes. This article has more info about the difficulty of measuring password strength—it looks like most 3rd party libraries aren’t very good at it.
But anyway, as I said, it typically isn’t rational for software companies to invest a lot in software security. If we are trying to approximate a function that takes a company’s financial interest in security as an input, and outputs the degree to which a company’s systems are secure, then the password example gives us a data point where the company’s financial interest is low and the security of their system is also low. Coral argues (correctly IMO) that Merchant Drones Inc. has a strong financial incentive to prevent people from swindling their drones. Extrapolating from the password guessing example the way she does makes the assumption that function mapping financial interest to security is a constant function. I don’t think that’s a reasonable assumption.
I haven’t seen people make much progress if they “try to solve the whole problem at once” with a few minutes/hours of experience thinking through the problem rather than a few months/years; usually that looks less like corrigibility or ALBA and more like “well we’ll shut it off if it starts doing scary stuff” or “well we’ll just raise it like a human child”.
The reason I’m complaining about this is because I sometimes try to have conversations with people in the community about ideas I have related to AI alignment, typically ideas that I can get across the gist of in <5 minutes but aren’t nearly as naive as “we’ll shut it off if it starts doing scary stuff” or “we’ll raise it like a human child”, but I have a hard time getting people to engage seriously. My diagnosis is that people in the community have some kind of learned helplessness around AI safety, believing it to be a big serious problem that only big serious people are allowed to think about it. Trying to make progress on AI alignment with any idea that can be explained in <5 minutes marks one as uncool and naive. Even worse, in some cases I think people get defensive about the idea that it might be possible to make progress on AI alignment in a 10-minute conversation—the community has a lot invested in the idea that AI alignment is a super difficult problem, and we’d all look like fools if it was possible to make meaningful progress in 10 minutes. I’m reminded of this quote from Richard Hamming:
...if you do some good work you will find yourself on all kinds of committees and unable to do any more work. You may find yourself as I saw Brattain when he got a Nobel Prize. The day the prize was announced we all assembled in Arnold Auditorium; all three winners got up and made speeches. The third one, Brattain, practically with tears in his eyes, said, ``I know about this Nobel-Prize effect and I am not going to let it affect me; I am going to remain good old Walter Brattain.″ Well I said to myself, ``That is nice.″ But in a few weeks I saw it was affecting him. Now he could only work on great problems.
When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn’t the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you. In fact I will give you my favorite quotation of many years. The Institute for Advanced Study in Princeton, in my opinion, has ruined more good scientists than any institution has created, judged by what they did before they came and judged by what they did after. Not that they weren’t good afterwards, but they were superb before they got there and were only good afterwards.
See also—if Richard Feynman was working on AI alignment, it sounds to me as though he’d see the naive suggestions of noobs as a source of occasional ideas rather than something to be ridiculed.
It sounds like you’re making a prediction here that Eliezer and others didn’t put much thought in advance into the potential risks or costs of this post. Having talked with Eliezer and others about the post beforehand, I can confirm that this prediction is false.
That’s good to hear.
The best way to deal with possible counterarguments is to rethink your arguments so they’re no longer vulnerable to them. (Example: Eliezer could have had Coral say something like “Why don’t companies just use this free and widely praised password strength measurement library on Github?” or “Why is there no good open source library to measure password strength?” instead of “Why don’t companies just measure entropy?” Random note: Insofar as the numbers and symbols thing is not just security theater, I’d guess it mainly makes it harder for friends/relatives to correctly guess that you used your dog’s name as your password, in order to decrease the volume of password-related support requests.) I’ll confess, when someone writes an article that seems to me like it’s written in an insufferably smug tone, yet I don’t get the sense that they’ve considered counterarguments that seem strong and obvious to me, that really rubs me the wrong way.
Agreed Eliezer hasn’t given much evidence for the claim that security mindset is often untrainable; it’s good that you’re flagging this explicitly. I think his goal was to promote the “not just anyone can be trained to think like Bruce Schneier” hypothesis to readers’ attention, and to say what his own current model is, not to defend the model in any detail.
I found the change in focus useful myself because Inadequate Equilibria talks so little about innate competence and individual skill gaps, even though it’s clearly one of the central puzzle pieces.
It might not be fair, but I think that’s fine here. The important question is whether the world is adequate in certain respects (e.g., at converting resources into some level of security and privacy for the average user), and what that implies in domains like AI that we care about. I don’t expect companies with inadequate password systems to suffer any great harm from a blog post criticizing them without spending an equal number of paragraphs criticizing organizations with even worse security practices. The most material question is whether password standards are in fact trivial to improve on in a way that makes users and companies much better off; it’s not clear to me whether we disagree about that, since it might be that you’re just focusing on a lower adequacy threshold.
I don’t know much about Qubes, but the idea of modularizing the problem, distinguishing trusted and untrusted system components, minimizing reliance on less trusted components, and looking for work-arounds to make things as easy as possible (without assuming they’re easy), sounds like ordinary MIRI research practice. Eliezer’s idea of corrigibility is an example of this approach, and Eliezer’s said that if alignment turns out to be surprisingly easy, one of the likeliest paths is if there turns out to be a good-enough concept of corrigibility that’s easy to train into systems.
Ambitious efforts to take a huge chunk out of the problem, or to find some hacky or elegant way to route around a central difficulty, seem good to me. I haven’t seen people make much progress if they “try to solve the whole problem at once” with a few minutes/hours of experience thinking through the problem rather than a few months/years; usually that looks less like corrigibility or ALBA and more like “well we’ll shut it off if it starts doing scary stuff” or “well we’ll just raise it like a human child”.
It sounds like you’re making a prediction here that Eliezer and others didn’t put much thought in advance into the potential risks or costs of this post. Having talked with Eliezer and others about the post beforehand, I can confirm that this prediction is false.
I think you’re overestimating how much overlap there is between what different people tend to think are the most obvious counterarguments to this or that AGI alignment argument. This is actually a hard problem. If Alice thinks counter-arguments A and B are obvious, Bob thinks counter-arguments B and C are obvious, and Carol thinks counter-arguments C and D are obvious, and you only have time to cover two counter-arguments before the post gets overly long, then no matter which arguments you choose you’ll end up with most readers thinking that you’ve neglected one or more “obvious” counter-arguments.
At the same time, if you try to address as many counter-arguments as you can given length constraints, you’ll inevitably end up with most readers feeling baffled at why you’re wasting time on trivial or straw counter-arguments that they don’t care about.
This is also made more difficult if you have to reply to all the counter-arguments Alice disagrees with but thinks someone else might agree with: Alice might be wrong about who is (or should be) in the target audience, she might be wrong about the beliefs of this or that potential target audience, or she might just have an impractically long list of counter-arguments to cover (due to not restricting herself to what she thinks is true or even all that probable). I think that group discussions often end up going in unproductive directions when hypothetical disagreements take the place of actual disagreements.
Thanks for the response!
If there’s a trivial way to measure password strength, the method has not occurred to me. Suppose my password generation algorithm randomly samples my password from the set of all alphanumeric strings that are between 6 and 20 characters long. That’s 715971350555965203672729120482208448 possible passwords I’m choosing from. Sounds pretty secure right? Well, two of those alphanumeric strings between 6 and 20 characters are “aaaaaaaaaa” and “password123″. A server that just sees “aaaaaaaaaa” as my password has no way a priori to know what algorithm I used to generate it.
I don’t expect it is worth the time of your average company to write a specialized module that attempts to reverse-engineer a user’s password in order to determine the algorithm that was likely used to generate it. I expect most companies who attempt to measure password strength this way do so using 3rd party libraries, not algorithms that have been developed in-house. The difficulty of doing this depends on whether there’s a good 3rd party library available for the platform the company is using, and how quickly an engineer can verify that the library isn’t doing anything suspicious with the passwords it analyzes. This article has more info about the difficulty of measuring password strength—it looks like most 3rd party libraries aren’t very good at it.
But anyway, as I said, it typically isn’t rational for software companies to invest a lot in software security. If we are trying to approximate a function that takes a company’s financial interest in security as an input, and outputs the degree to which a company’s systems are secure, then the password example gives us a data point where the company’s financial interest is low and the security of their system is also low. Coral argues (correctly IMO) that Merchant Drones Inc. has a strong financial incentive to prevent people from swindling their drones. Extrapolating from the password guessing example the way she does makes the assumption that function mapping financial interest to security is a constant function. I don’t think that’s a reasonable assumption.
The reason I’m complaining about this is because I sometimes try to have conversations with people in the community about ideas I have related to AI alignment, typically ideas that I can get across the gist of in <5 minutes but aren’t nearly as naive as “we’ll shut it off if it starts doing scary stuff” or “we’ll raise it like a human child”, but I have a hard time getting people to engage seriously. My diagnosis is that people in the community have some kind of learned helplessness around AI safety, believing it to be a big serious problem that only big serious people are allowed to think about it. Trying to make progress on AI alignment with any idea that can be explained in <5 minutes marks one as uncool and naive. Even worse, in some cases I think people get defensive about the idea that it might be possible to make progress on AI alignment in a 10-minute conversation—the community has a lot invested in the idea that AI alignment is a super difficult problem, and we’d all look like fools if it was possible to make meaningful progress in 10 minutes. I’m reminded of this quote from Richard Hamming:
See also—if Richard Feynman was working on AI alignment, it sounds to me as though he’d see the naive suggestions of noobs as a source of occasional ideas rather than something to be ridiculed.
That’s good to hear.
The best way to deal with possible counterarguments is to rethink your arguments so they’re no longer vulnerable to them. (Example: Eliezer could have had Coral say something like “Why don’t companies just use this free and widely praised password strength measurement library on Github?” or “Why is there no good open source library to measure password strength?” instead of “Why don’t companies just measure entropy?” Random note: Insofar as the numbers and symbols thing is not just security theater, I’d guess it mainly makes it harder for friends/relatives to correctly guess that you used your dog’s name as your password, in order to decrease the volume of password-related support requests.) I’ll confess, when someone writes an article that seems to me like it’s written in an insufferably smug tone, yet I don’t get the sense that they’ve considered counterarguments that seem strong and obvious to me, that really rubs me the wrong way.