Security Mindset and Ordinary Paranoia

Fol­low-up to: AI Align­ment: Why It’s Hard, and Where to Start


(am­ber, a philan­thropist in­ter­ested in a more re­li­able In­ter­net, and coral, a com­puter se­cu­rity pro­fes­sional, are at a con­fer­ence ho­tel to­gether dis­cussing what Co­ral in­sists is a difficult and im­por­tant is­sue: the difficulty of build­ing “se­cure” soft­ware.)

am­ber: So, Co­ral, I un­der­stand that you be­lieve it is very im­por­tant, when cre­at­ing soft­ware, to make that soft­ware be what you call “se­cure”.

coral: Espe­cially if it’s con­nected to the In­ter­net, or if it con­trols money or other valuables. But yes, that’s right.

am­ber: I find it hard to be­lieve that this needs to be a sep­a­rate topic in com­puter sci­ence. In gen­eral, pro­gram­mers need to figure out how to make com­put­ers do what they want. The peo­ple build­ing op­er­at­ing sys­tems surely won’t want them to give ac­cess to unau­tho­rized users, just like they won’t want those com­put­ers to crash. Why is one prob­lem so much more difficult than the other?

coral: That’s a deep ques­tion, but to give a par­tial deep an­swer: When you ex­pose a de­vice to the In­ter­net, you’re po­ten­tially ex­pos­ing it to in­tel­li­gent ad­ver­saries who can find spe­cial, weird in­ter­ac­tions with the sys­tem that make the pieces be­have in weird ways that the pro­gram­mers did not think of. When you’re deal­ing with that kind of prob­lem, you’ll use a differ­ent set of meth­ods and tools.

am­ber: Any sys­tem that crashes is be­hav­ing in a way the pro­gram­mer didn’t ex­pect, and pro­gram­mers already need to stop that from hap­pen­ing. How is this case differ­ent?

coral: Okay, so… imag­ine that your sys­tem is go­ing to take in one kilo­byte of in­put per ses­sion. (Although that it­self is the sort of as­sump­tion we’d ques­tion and ask what hap­pens if it gets a megabyte of in­put in­stead—but never mind.) If the in­put is one kilo­byte, then there are 28,000 pos­si­ble in­puts, or about 102,400 or so. Again, for the sake of ex­tend­ing the sim­ple vi­su­al­iza­tion, imag­ine that a com­puter gets a billion in­puts per sec­ond. Sup­pose that only a googol, 10100, out of the 102,400 pos­si­ble in­puts, cause the sys­tem to be­have a cer­tain way the origi­nal de­signer didn’t in­tend.

If the sys­tem is get­ting in­puts in a way that’s un­cor­re­lated with whether the in­put is a mis­be­hav­ing one, it won’t hit on a mis­be­hav­ing state be­fore the end of the uni­verse. If there’s an in­tel­li­gent ad­ver­sary who un­der­stands the sys­tem, on the other hand, they may be able to find one of the very rare in­puts that makes the sys­tem mis­be­have. So a piece of the sys­tem that would liter­ally never in a mil­lion years mis­be­have on ran­dom in­puts, may break when an in­tel­li­gent ad­ver­sary tries de­liber­ately to break it.

am­ber: So you’re say­ing that it’s more difficult be­cause the pro­gram­mer is pit­ting their wits against an ad­ver­sary who may be more in­tel­li­gent than them­selves.

coral: That’s an al­most-right way of putting it. What mat­ters isn’t so much the “ad­ver­sary” part as the op­ti­miza­tion part. There are sys­tem­atic, non­ran­dom forces strongly se­lect­ing for par­tic­u­lar out­comes, caus­ing pieces of the sys­tem to go down weird ex­e­cu­tion paths and oc­cupy un­ex­pected states. If your sys­tem liter­ally has no mis­be­hav­ior modes at all, it doesn’t mat­ter if you have IQ 140 and the en­emy has IQ 160—it’s not an arm-wrestling con­test. It’s just very much harder to build a sys­tem that doesn’t en­ter weird states when the weird states are be­ing se­lected-for in a cor­re­lated way, rather than hap­pen­ing only by ac­ci­dent. The weird­ness-se­lect­ing forces can search through parts of the larger state space that you your­self failed to imag­ine. Beat­ing that does in­deed re­quire new skills and a differ­ent mode of think­ing, what Bruce Sch­neier called “se­cu­rity mind­set”.

am­ber: Ah, and what is this se­cu­rity mind­set?

coral: I can say one or two things about it, but keep in mind we are deal­ing with a qual­ity of think­ing that is not en­tirely ef­fable. If I could give you a hand­ful of plat­i­tudes about se­cu­rity mind­set, and that would ac­tu­ally cause you to be able to de­sign se­cure soft­ware, the In­ter­net would look very differ­ent from how it presently does. That said, it seems to me that what has been called “se­cu­rity mind­set” can be di­vided into two com­po­nents, one of which is much less difficult than the other. And this can fool peo­ple into over­es­ti­mat­ing their own safety, be­cause they can get the eas­ier half of se­cu­rity mind­set and over­look the other half. The less difficult com­po­nent, I will call by the term “or­di­nary para­noia”.

am­ber: Or­di­nary para­noia?

coral: Lots of pro­gram­mers have the abil­ity to imag­ine ad­ver­saries try­ing to threaten them. They imag­ine how likely it is that the ad­ver­saries are able to at­tack them a par­tic­u­lar way, and then they try to block off the ad­ver­saries from threat­en­ing that way. Imag­in­ing at­tacks, in­clud­ing weird or clever at­tacks, and par­ry­ing them with mea­sures you imag­ine will stop the at­tack; that is or­di­nary para­noia.

am­ber: Isn’t that what se­cu­rity is all about? What do you claim is the other half?

coral: To put it as a plat­i­tude, I might say… defend­ing against mis­takes in your own as­sump­tions rather than against ex­ter­nal ad­ver­saries.

am­ber: Can you give me an ex­am­ple of a differ­ence?

coral: An or­di­nary para­noid pro­gram­mer imag­ines that an ad­ver­sary might try to read the file con­tain­ing all the user­names and pass­words. They might try to store the file in a spe­cial, se­cure area of the disk or a spe­cial sub­part of the op­er­at­ing sys­tem that’s sup­posed to be harder to read. Con­versely, some­body with se­cu­rity mind­set thinks, “No mat­ter what kind of spe­cial sys­tem I put around this file, I’m dis­turbed by need­ing to make the as­sump­tion that this file can’t be read. Maybe the spe­cial code I write, be­cause it’s used less of­ten, is more likely to con­tain bugs. Or maybe there’s a way to fish data out of the disk that doesn’t go through the code I wrote.”

am­ber: And they imag­ine more and more ways that the ad­ver­sary might be able to get at the in­for­ma­tion, and block those av­enues off too! Be­cause they have bet­ter imag­i­na­tions.

coral: Well, we kind of do, but that’s not the key differ­ence. What we’ll re­ally want to do is come up with a way for the com­puter to check pass­words that doesn’t rely on the com­puter stor­ing the pass­word at all, any­where.

am­ber: Ah, like en­crypt­ing the pass­word file!

coral: No, that just du­pli­cates the prob­lem at one re­move. If the com­puter can de­crypt the pass­word file to check it, it’s stored the de­cryp­tion key some­where, and the at­tacker may be able to steal that key too.

am­ber: But then the at­tacker has to steal two things in­stead of one; doesn’t that make the sys­tem more se­cure? Espe­cially if you write two differ­ent sec­tions of spe­cial filesys­tem code for hid­ing the en­cryp­tion key and hid­ing the en­crypted pass­word file?

coral: That’s ex­actly what I mean by dis­t­in­guish­ing “or­di­nary para­noia” that doesn’t cap­ture the full se­cu­rity mind­set. So long as the sys­tem is ca­pa­ble of re­con­struct­ing the pass­word, we’ll always worry that the ad­ver­sary might be able to trick the sys­tem into do­ing just that. What some­body with se­cu­rity mind­set will rec­og­nize as a deeper solu­tion is to store a one-way hash of the pass­word, rather than stor­ing the plain­text pass­word. Then even if the at­tacker reads off the pass­word file, they still can’t give what the sys­tem will rec­og­nize as a pass­word.

am­ber: Ah, that’s quite clever! But I don’t see what’s so qual­i­ta­tively differ­ent be­tween that mea­sure, and my mea­sure for hid­ing the key and the en­crypted pass­word file sep­a­rately. I agree that your mea­sure is more clever and el­e­gant, but of course you’ll know bet­ter stan­dard solu­tions than I do, since you work in this area pro­fes­sion­ally. I don’t see the qual­i­ta­tive line di­vid­ing your solu­tion from my solu­tion.

coral: Um, it’s hard to say this with­out offend­ing some peo­ple, but… it’s pos­si­ble that even af­ter I try to ex­plain the differ­ence, which I’m about to do, you won’t get it. Like I said, if I could give you some handy plat­i­tudes and trans­form you into some­body ca­pa­ble of do­ing truly good work in com­puter se­cu­rity, the In­ter­net would look very differ­ent from its pre­sent form. I can try to de­scribe one as­pect of the differ­ence, but that may put me in the po­si­tion of a math­e­mat­i­cian try­ing to ex­plain what looks more promis­ing about one proof av­enue than an­other; you can listen to ev­ery­thing they say and nod along and still not be trans­formed into a math­e­mat­i­cian. So I am go­ing to try to ex­plain the differ­ence, but again, I don’t know of any sim­ple in­struc­tion man­u­als for be­com­ing Bruce Sch­neier.

am­ber: I con­fess to feel­ing slightly skep­ti­cal at this sup­pos­edly in­ef­fable abil­ity that some peo­ple pos­sess and oth­ers don’t—

coral: There are things like that in many pro­fes­sions. Some peo­ple pick up pro­gram­ming at age five by glanc­ing through a page of BASIC pro­grams writ­ten for a TRS-80, and some peo­ple strug­gle re­ally hard to grasp ba­sic Python at age twenty-five. That’s not be­cause there’s some mys­te­ri­ous truth the five-year-old knows that you can ver­bally trans­mit to the twenty-five-year-old.

And, yes, the five-year-old will be­come far bet­ter with prac­tice; it’s not like we’re talk­ing about un­train­able ge­nius. And there may be plat­i­tudes you can tell the 25-year-old that will help them strug­gle a lit­tle less. But some­times a pro­fes­sion re­quires think­ing in an un­usual way and some peo­ple’s minds more eas­ily turn side­ways in that par­tic­u­lar di­men­sion.

am­ber: Fine, go on.

coral: Okay, so… you thought of putting the en­crypted pass­word file in one spe­cial place in the filesys­tem, and the key in an­other spe­cial place. Why not en­crypt the key too, write a third spe­cial sec­tion of code, and store the key to the en­crypted key there? Wouldn’t that make the sys­tem even more se­cure? How about seven keys hid­den in differ­ent places, wouldn’t that be ex­tremely se­cure? Prac­ti­cally un­break­able, even?

am­ber: Well, that ver­sion of the idea does feel a lit­tle silly. If you’re try­ing to se­cure a door, a lock that takes two keys might be more se­cure than a lock that only needs one key, but seven keys doesn’t feel like it makes the door that much more se­cure than two.

coral: Why not?

am­ber: It just seems silly. You’d prob­a­bly have a bet­ter way of say­ing it than I would.

coral: Well, a fancy way of de­scribing the silli­ness is that the chance of ob­tain­ing the sev­enth key is not con­di­tion­ally in­de­pen­dent of the chance of ob­tain­ing the first two keys. If I can read the en­crypted pass­word file, and read your en­crypted en­cryp­tion key, then I’ve prob­a­bly come up with some­thing that just by­passes your filesys­tem and reads di­rectly from the disk. And the more com­pli­cated you make your filesys­tem, the more likely it is that I can find a weird sys­tem state that will let me do just that. Maybe the spe­cial sec­tion of filesys­tem code you wrote to hide your fourth key is the one with the bug that lets me read the disk di­rectly.

am­ber: So the differ­ence is that the per­son with a true se­cu­rity mind­set found a defense that makes the sys­tem sim­pler rather than more com­pli­cated.

coral: Again, that’s al­most right. By hash­ing the pass­words, the se­cu­rity pro­fes­sional has made their rea­son­ing about the sys­tem less com­pli­cated. They’ve elimi­nated the need for an as­sump­tion that might be put un­der a lot of pres­sure. If you put the key in one spe­cial place and the en­crypted pass­word file in an­other spe­cial place, the sys­tem as a whole is still able to de­crypt the user’s pass­word. An ad­ver­sary prob­ing the state space might be able to trig­ger that pass­word-de­crypt­ing state be­cause the sys­tem is de­signed to do that on at least some oc­ca­sions. By hash­ing the pass­word file we elimi­nate that whole in­ter­nal de­bate from the rea­son­ing on which the sys­tem’s se­cu­rity rests.

am­ber: But even af­ter you’ve come up with that clever trick, some­thing could still go wrong. You’re still not ab­solutely se­cure. What if some­body uses “pass­word” as their pass­word?

coral: Or what if some­body comes up a way to read off the pass­word af­ter the user has en­tered it and while it’s still stored in RAM, be­cause some­thing got ac­cess to RAM? The point of elimi­nat­ing the ex­tra as­sump­tion from the rea­son­ing about the sys­tem’s se­cu­rity is not that we are then ab­solutely se­cure and safe and can re­lax. Some­body with se­cu­rity mind­set is never go­ing to be that re­laxed about the ed­ifice of rea­son­ing say­ing the sys­tem is se­cure.

For that mat­ter, while there are some nor­mal pro­gram­mers do­ing nor­mal pro­gram­ming who might put in a bunch of de­bug­ging effort and then feel satis­fied, like they’d done all they could rea­son­ably do, pro­gram­mers with de­cent lev­els of or­di­nary para­noia about or­di­nary pro­grams will go on chew­ing ideas in the shower and com­ing up with more func­tion tests for the sys­tem to pass. So the dis­tinc­tion be­tween se­cu­rity mind­set and or­di­nary para­noia isn’t that or­di­nary para­noids will re­lax. It’s that… again to put it as a plat­i­tude, the or­di­nary para­noid is run­ning around putting out fires in the form of ways they imag­ine an ad­ver­sary might at­tack, and some­body with se­cu­rity mind­set is defend­ing against some­thing closer to “what if an el­e­ment of this rea­son­ing is mis­taken”. In­stead of try­ing re­ally hard to en­sure no­body can read a disk, we are go­ing to build a sys­tem that’s se­cure even if some­body does read the disk, and that is our first line of defense. And then we are also go­ing to build a filesys­tem that doesn’t let ad­ver­saries read the pass­word file, as a sec­ond line of defense in case our one-way hash is se­cretly bro­ken, and be­cause there’s no pos­i­tive need to let ad­ver­saries read the disk so why let them. And then we’re go­ing to salt the hash in case some­body snuck a low-en­tropy pass­word through our sys­tem and the ad­ver­sary man­ages to read the pass­word any­way.

am­ber: So rather than try­ing to out­wit ad­ver­saries, some­body with true se­cu­rity mind­set tries to make fewer as­sump­tions.

coral: Well, we think in terms of ad­ver­saries too! Ad­ver­sar­ial rea­son­ing is eas­ier to teach than se­cu­rity mind­set, but it’s still (a) manda­tory and (b) hard to teach in an ab­solute sense. A lot of peo­ple can’t mas­ter it, which is why a de­scrip­tion of “se­cu­rity mind­set” of­ten opens with a story about some­body failing at ad­ver­sar­ial rea­son­ing and some­body else launch­ing a clever at­tack to pen­e­trate their defense.

You need to mas­ter two ways of think­ing, and there are a lot of peo­ple go­ing around who have the first way of think­ing but not the sec­ond. One way I’d de­scribe the deeper skill is see­ing a sys­tem’s se­cu­rity as rest­ing on a story about why that sys­tem is safe. We want that safety-story to be as solid as pos­si­ble. One of the im­pli­ca­tions is rest­ing the story on as few as­sump­tions as pos­si­ble; as the say­ing goes, the only gear that never fails is one that has been de­signed out of the ma­chine.

am­ber: But can’t you also get bet­ter se­cu­rity by adding more lines of defense? Wouldn’t that be more com­plex­ity in the story, and also bet­ter se­cu­rity?

coral: There’s also some­thing to be said for prefer­ring dis­junc­tive rea­son­ing over con­junc­tive rea­son­ing in the safety-story. But it’s im­por­tant to re­al­ize that you do want a pri­mary line of defense that is sup­posed to just work and be unas­sailable, not a se­ries of weaker fences that you think might maybe work. Some­body who doesn’t un­der­stand cryp­tog­ra­phy might de­vise twenty clever-seem­ing am­a­teur codes and ap­ply them all in se­quence, think­ing that, even if one of the codes turns out to be break­able, surely they won’t all be break­able. The NSA will as­sign that mighty ed­ifice of am­a­teur en­cryp­tion to an in­tern, and the in­tern will crack it in an af­ter­noon. There’s some­thing to be said for re­dun­dancy, and hav­ing fal­lbacks in case the unas­sailable wall falls; it can be wise to have ad­di­tional lines of defense, so long as the added com­plex­ity does not make the larger sys­tem harder to un­der­stand or in­crease its vuln­er­a­ble sur­faces. But at the core you need a sim­ple, solid story about why the sys­tem is se­cure, and a good se­cu­rity thinker will be try­ing to elimi­nate whole as­sump­tions from that story and strength­en­ing its core pillars, not only scur­ry­ing around par­ry­ing ex­pected at­tacks and putting out risk-fires.

That said, it’s bet­ter to use two true as­sump­tions than one false as­sump­tion, so sim­plic­ity isn’t ev­ery­thing.

am­ber: I won­der if that way of think­ing has ap­pli­ca­tions be­yond com­puter se­cu­rity?

coral: I’d rather think so, as the proverb about gears sug­gests.

For ex­am­ple, step­ping out of char­ac­ter for a mo­ment, the au­thor of this di­alogue has some­times been known to dis­cuss the al­ign­ment prob­lem for Ar­tifi­cial Gen­eral In­tel­li­gence. He was talk­ing at one point about try­ing to mea­sure rates of im­prove­ment in­side a grow­ing AI sys­tem, so that it would not do too much think­ing with hu­mans out of the loop if a break­through oc­curred while the sys­tem was run­ning overnight. The per­son he was talk­ing to replied that, to him, it seemed un­likely that an AGI would gain in power that fast. To which the au­thor replied, more or less:

It shouldn’t be your job to guess how fast the AGI might im­prove! If you write a sys­tem that will hurt you if a cer­tain speed of self-im­prove­ment turns out to be pos­si­ble, then you’ve writ­ten the wrong code. The code should just never hurt you re­gard­less of the true value of that back­ground pa­ram­e­ter.

A bet­ter way to set up the AGI would be to mea­sure how much im­prove­ment is tak­ing place, and if more than X im­prove­ment takes place, sus­pend the sys­tem un­til a pro­gram­mer val­i­dates the progress that’s already oc­curred. That way even if the im­prove­ment takes place over the course of a mil­lisec­ond, you’re still fine, so long as the sys­tem works as in­tended. Maybe the sys­tem doesn’t work as in­tended be­cause of some other mis­take, but that’s a bet­ter prob­lem to worry about than a sys­tem that hurts you even if it works as in­tended.

Similarly, you want to de­sign the sys­tem so that if it dis­cov­ers amaz­ing new ca­pa­bil­ities, it waits for an op­er­a­tor to val­i­date use of those ca­pa­bil­ities—not rely on the op­er­a­tor to watch what’s hap­pen­ing and press a sus­pend but­ton. You shouldn’t rely on the speed of dis­cov­ery or the speed of dis­aster be­ing less than the op­er­a­tor’s re­ac­tion time. There’s no need to bake in an as­sump­tion like that if you can find a de­sign that’s safe re­gard­less. For ex­am­ple, by op­er­at­ing on a paradigm of al­low­ing op­er­a­tor-whitelisted meth­ods rather than avoid­ing op­er­a­tor-black­listed meth­ods; you re­quire the op­er­a­tor to say “Yes” be­fore pro­ceed­ing, rather than as­sum­ing they’re pre­sent and at­ten­tive and can say “No” fast enough.

am­ber: Well, okay, but if we’re guard­ing against an AI sys­tem dis­cov­er­ing cos­mic pow­ers in a mil­lisec­ond, that does seem to me like an un­rea­son­able thing to worry about. I guess that marks me as a merely or­di­nary para­noid.

coral: In­deed, one of the hal­l­marks of se­cu­rity pro­fes­sion­als is that they spend a lot of time wor­ry­ing about edge cases that would fail to alarm an or­di­nary para­noid be­cause the edge case doesn’t sound like some­thing an ad­ver­sary is likely to do. Here’s an ex­am­ple from the Free­dom to Tinker blog:

This in­ter­est in “harm­less failures” – cases where an ad­ver­sary can cause an anoma­lous but not di­rectly harm­ful out­come – is an­other hal­l­mark of the se­cu­rity mind­set. Not all “harm­less failures” lead to big trou­ble, but it’s sur­pris­ing how of­ten a clever ad­ver­sary can pile up a stack of seem­ingly harm­less failures into a dan­ger­ous tower of trou­ble. Harm­less failures are bad hy­giene. We try to stamp them out when we can…

To see why, con­sider the donotre­ply.com email story that hit the press re­cently. When com­pa­nies send out com­mer­cial email (e.g., an air­line no­tify­ing a pas­sen­ger of a flight de­lay) and they don’t want the re­cip­i­ent to re­ply to the email, they of­ten put in a bo­gus From ad­dress like donotre­ply@donotre­ply.com. A clever guy reg­istered the do­main donotre­ply.com, thereby re­ceiv­ing all email ad­dressed to donotre­ply.com. This in­cluded “bounce” replies to mis­ad­dressed emails, some of which con­tained copies of the origi­nal email, with in­for­ma­tion such as bank ac­count state­ments, site in­for­ma­tion about mil­i­tary bases in Iraq, and so on…

The peo­ple who put donotre­ply.com email ad­dresses into their out­go­ing email must have known that they didn’t con­trol the donotre­ply.com do­main, so they must have thought of any re­ply mes­sages di­rected there as harm­less failures. Hav­ing got­ten that far, there are two ways to avoid trou­ble. The first way is to think care­fully about the traf­fic that might go to donotre­ply.com, and re­al­ize that some of it is ac­tu­ally dan­ger­ous. The sec­ond way is to think, “This looks like a harm­less failure, but we should avoid it any­way. No good can come of this.” The first way pro­tects you if you’re clever; the sec­ond way always pro­tects you.

“The first way pro­tects you if you’re clever; the sec­ond way always pro­tects you.” That’s very much the other half of the se­cu­rity mind­set. It’s what this es­say’s au­thor was do­ing by talk­ing about AGI al­ign­ment that runs on whitelist­ing rather than black­list­ing: you shouldn’t as­sume you’ll be clever about how fast the AGI sys­tem could dis­cover ca­pa­bil­ities, you should have a sys­tem that doesn’t use not-yet-whitelisted ca­pa­bil­ities even if they are dis­cov­ered very sud­denly.

If your AGI would hurt you if it gained to­tal cos­mic pow­ers in one mil­lisec­ond, that means you built a cog­ni­tive pro­cess that is in some sense try­ing to hurt you and failing only due to what you think is a lack of ca­pa­bil­ity. This is very bad and you should be de­sign­ing some other AGI sys­tem in­stead. AGI sys­tems should never be run­ning a search that will hurt you if the search comes up non-empty. You should not be try­ing to fix that by mak­ing sure the search comes up empty thanks to your clever shal­low defenses clos­ing off all the AGI’s clever av­enues for hurt­ing you. You should fix that by mak­ing sure no search like that ever runs. It’s a silly thing to do with com­put­ing power, and you should do some­thing else with com­put­ing power in­stead.

Go­ing back to or­di­nary com­puter se­cu­rity, if you try build­ing a lock with seven keys hid­den in differ­ent places, you are in some di­men­sion pit­ting your clev­er­ness against an ad­ver­sary try­ing to read the keys. The per­son with se­cu­rity mind­set doesn’t want to rely on hav­ing to win the clev­er­ness con­test. An or­di­nary para­noid, some­body who can mas­ter the kind of de­fault para­noia that lots of in­tel­li­gent pro­gram­mers have, will look at the Re­ply-To field say­ing donotre­ply@donotre­ply.com and think about the pos­si­bil­ity of an ad­ver­sary reg­is­ter­ing the donotre­ply.com do­main. Some­body with se­cu­rity mind­set thinks in as­sump­tions rather than ad­ver­saries. “Well, I’m as­sum­ing that this re­ply email goes nowhere,” they’ll think, “but maybe I should de­sign the sys­tem so that I don’t need to fret about whether that as­sump­tion is true.”

am­ber: Be­cause as the truly great para­noid knows, what seems like a ridicu­lously im­prob­a­ble way for the ad­ver­sary to at­tack some­times turns out to not be so ridicu­lous af­ter all.

coral: Again, that’s a not-ex­actly-right way of putting it. When I don’t set up an email to origi­nate from donotre­ply@donotre­ply.com, it’s not just be­cause I’ve ap­pre­ci­ated that an ad­ver­sary reg­is­ter­ing donotre­ply.com is more prob­a­ble than the novice imag­ines. For all I know, when a bounce email is sent to nowhere, there’s all kinds of things that might hap­pen! Maybe the way a bounced email works is that the email gets routed around to weird places look­ing for that ad­dress. I don’t know, and I don’t want to have to study it. In­stead I’ll ask: Can I make it so that a bounced email doesn’t gen­er­ate a re­ply? Can I make it so that a bounced email doesn’t con­tain the text of the origi­nal mes­sage? Maybe I can query the email server to make sure it still has a user by that name be­fore I try send­ing the mes­sage?—though there may still be “va­ca­tion” au­tore­sponses that mean I’d bet­ter con­trol the replied-to ad­dress my­self. If it would be very bad for some­body unau­tho­rized to read this, maybe I shouldn’t be send­ing it in plain­text by email.

am­ber: So the per­son with true se­cu­rity mind­set un­der­stands that where there’s one prob­lem, demon­strated by what seems like a very un­likely thought ex­per­i­ment, there’s likely to be more re­al­is­tic prob­lems that an ad­ver­sary can in fact ex­ploit. What I think of as weird im­prob­a­ble failure sce­nar­ios are ca­naries in the coal mine, that would warn a truly para­noid per­son of big­ger prob­lems on the way.

coral: Again that’s not ex­actly right. The per­son with or­di­nary para­noia hears about donotre­ply@donotre­ply.com and may think some­thing like, “Oh, well, it’s not very likely that an at­tacker will ac­tu­ally try to reg­ister that do­main, I have more ur­gent is­sues to worry about,” be­cause in that mode of think­ing, they’re run­ning around putting out things that might be fires, and they have to pri­ori­tize the things that are most likely to be fires.

If you demon­strate a weird edge-case thought ex­per­i­ment to some­body with se­cu­rity mind­set, they don’t see some­thing that’s more likely to be a fire. They think, “Oh no, my be­lief that those bounce emails go nowhere was FALSE!” The OpenBSD pro­ject to build a se­cure op­er­at­ing sys­tem has also, in pass­ing, built an ex­tremely ro­bust op­er­at­ing sys­tem, be­cause from their per­spec­tive any bug that po­ten­tially crashes the sys­tem is con­sid­ered a crit­i­cal se­cu­rity hole. An or­di­nary para­noid sees an in­put that crashes the sys­tem and thinks, “A crash isn’t as bad as some­body steal­ing my data. Un­til you demon­strate to me that this bug can be used by the ad­ver­sary to steal data, it’s not ex­tremely crit­i­cal.” Some­body with se­cu­rity mind­set thinks, “Noth­ing in­side this sub­sys­tem is sup­posed to be­have in a way that crashes the OS. Some sec­tion of code is be­hav­ing in a way that does not work like my model of that code. Who knows what it might do? The sys­tem isn’t sup­posed to crash, so by mak­ing it crash, you have demon­strated that my be­liefs about how this sys­tem works are false.”

am­ber: I’ll be hon­est: It has some­times struck me that peo­ple who call them­selves se­cu­rity pro­fes­sion­als seem overly con­cerned with what, to me, seem like very im­prob­a­ble sce­nar­ios. Like some­body for­get­ting to check the end of a buffer and an ad­ver­sary throw­ing in a huge string of char­ac­ters that over­write the end of the stack with a re­turn ad­dress that jumps to a sec­tion of code some­where else in the sys­tem that does some­thing the ad­ver­sary wants. How likely is that re­ally to be a prob­lem? I sus­pect that in the real world, what’s more likely is some­body mak­ing their pass­word “pass­word”. Shouldn’t you be mainly guard­ing against that in­stead?

coral: You have to do both. This game is short on con­so­la­tion prizes. If you want your sys­tem to re­sist at­tack by ma­jor gov­ern­ments, you need it to ac­tu­ally be pretty darned se­cure, gosh darn it. The fact that some users may try to make their pass­word be “pass­word” does not change the fact that you also have to pro­tect against buffer overflows.

am­ber: But even when some­body with se­cu­rity mind­set de­signs an op­er­at­ing sys­tem, it of­ten still ends up with suc­cess­ful at­tacks against it, right? So if this deeper para­noia doesn’t elimi­nate all chance of bugs, is it re­ally worth the ex­tra effort?

coral: If you don’t have some­body who thinks this way in charge of build­ing your op­er­at­ing sys­tem, it has no chance of not failing im­me­di­ately. Peo­ple with se­cu­rity mind­set some­times fail to build se­cure sys­tems. Peo­ple with­out se­cu­rity mind­set always fail at se­cu­rity if the sys­tem is at all com­plex. What this way of think­ing buys you is a chance that your sys­tem takes longer than 24 hours to break.

am­ber: That sounds a lit­tle ex­treme.

coral: His­tory shows that re­al­ity has not cared what you con­sider “ex­treme” in this re­gard, and that is why your Wi-Fi-en­abled light­bulb is part of a Rus­sian bot­net.

am­ber: Look, I un­der­stand that you want to get all the fiddly tiny bits of the sys­tem ex­actly right. I like tidy neat things too. But let’s be rea­son­able; we can’t always get ev­ery­thing we want in life.

coral: You think you’re ne­go­ti­at­ing with me, but you’re re­ally ne­go­ti­at­ing with Mur­phy’s Law. I’m afraid that Mr. Mur­phy has his­tor­i­cally been quite un­rea­son­able in his de­mands, and rather un­for­giv­ing of those who re­fuse to meet them. I’m not ad­vo­cat­ing a policy to you, just tel­ling you what hap­pens if you don’t fol­low that policy. Maybe you think it’s not par­tic­u­larly bad if your light­bulb is do­ing de­nial-of-ser­vice at­tacks on a mat­tress store in Es­to­nia. But if you do want a sys­tem to be se­cure, you need to do cer­tain things, and that part is more of a law of na­ture than a ne­go­tiable de­mand.

am­ber: Non-ne­go­tiable, eh? I bet you’d change your tune if some­body offered you twenty thou­sand dol­lars. But any­way, one thing I’m sur­prised you’re not men­tion­ing more is the part where peo­ple with se­cu­rity mind­set always sub­mit their idea to peer scrutiny and then ac­cept what other peo­ple vote about it. I do like the sound of that; it sounds very com­mu­ni­tar­ian and mod­est.

coral: I’d say that’s part of the or­di­nary para­noia that lots of pro­gram­mers have. The point of sub­mit­ting ideas to oth­ers’ scrutiny isn’t that hard to un­der­stand, though cer­tainly there are plenty of peo­ple who don’t even do that. If I had any origi­nal re­marks to con­tribute to that well-worn topic in com­puter se­cu­rity, I’d re­mark that it’s framed as ad­vice to wise para­noids, but of course the peo­ple who need it even more are the happy in­no­cents.

am­ber: Happy in­no­cents?

coral: Peo­ple who lack even or­di­nary para­noia. Happy in­no­cents tend to en­vi­sion ways that their sys­tem works, but not ask at all how their sys­tem might fail, un­til some­body prompts them into that, and even then they can’t do it. Or at least that’s been my ex­pe­rience, and that of many oth­ers in the pro­fes­sion.

There’s a cer­tain in­cred­ibly ter­rible cryp­to­graphic sys­tem, the equiv­a­lent of the Fool’s Mate in chess, which is some­times con­verged on by the most to­tal sort of am­a­teur, namely Fast XOR. That’s pick­ing a pass­word, re­peat­ing the pass­word, and XORing the data with the re­peated pass­word string. The per­son who in­vents this sys­tem may not be able to take the per­spec­tive of an ad­ver­sary at all. He wants his mar­velous ci­pher to be un­break­able, and he is not able to truly en­ter the frame of mind of some­body who wants his ci­pher to be break­able. If you ask him, “Please, try to imag­ine what could pos­si­bly go wrong,” he may say, “Well, if the pass­word is lost, the data will be for­ever un­re­cov­er­able be­cause my en­cryp­tion al­gorithm is too strong; I guess that’s some­thing that could go wrong.” Or, “Maybe some­body sab­o­tages my code,” or, “If you re­ally in­sist that I in­vent far-fetched sce­nar­ios, maybe the com­puter spon­ta­neously de­cides to di­s­obey my pro­gram­ming.” Of course any com­pe­tent or­di­nary para­noid asks the most skil­led peo­ple they can find to look at a bright idea and try to shoot it down, be­cause other minds may come in at a differ­ent an­gle or know other stan­dard tech­niques. But the other rea­son why we say “Don’t roll your own crypto!” and “Have a se­cu­rity ex­pert look at your bright idea!” is in hopes of reach­ing the many peo­ple who can’t at all in­vert the po­lar­ity of their goals—they don’t think that way spon­ta­neously, and if you try to force them to do it, their thoughts go in un­pro­duc­tive di­rec­tions.

am­ber: Like… the same way many peo­ple on the Right/​Left seem ut­terly in­ca­pable of step­ping out­side their own trea­sured per­spec­tives to pass the Ide­olog­i­cal Tur­ing Test of the Left/​Right.

coral: I don’t know if it’s ex­actly the same men­tal gear or ca­pa­bil­ity, but there’s a definite similar­ity. Some­body who lacks or­di­nary para­noia can’t take on the view­point of some­body who wants Fast XOR to be break­able, and pass that ad­ver­sary’s Ide­olog­i­cal Tur­ing Test for at­tempts to break Fast XOR.

am­ber: Can’t, or won’t? You seem to be talk­ing like these are in­nate, un­train­able abil­ities.

coral: Well, at the least, there will be differ­ent lev­els of tal­ent, as usual in a pro­fes­sion. And also as usual, tal­ent vastly benefits from train­ing and prac­tice. But yes, it has some­times seemed to me that there is a kind of qual­i­ta­tive step or gear here, where some peo­ple can shift per­spec­tive to imag­ine an ad­ver­sary that truly wants to break their code… or a re­al­ity that isn’t cheer­ing for their plan to work, or aliens who evolved differ­ent emo­tions, or an AI that doesn’t want to con­clude its rea­son­ing with “And there­fore the hu­mans should live hap­pily ever af­ter”, or a fic­tional char­ac­ter who be­lieves in Sith ide­ol­ogy and yet doesn’t be­lieve they’re the bad guy.

It does some­times seem to me like some peo­ple sim­ply can’t shift per­spec­tive in that way. Maybe it’s not that they truly lack the wiring, but that there’s an in­stinc­tive poli­ti­cal off-switch for the abil­ity. Maybe they’re scared to let go of their men­tal an­chors. But from the out­side it looks like the same re­sult: some peo­ple do it, some peo­ple don’t. Some peo­ple spon­ta­neously in­vert the po­lar­ity of their in­ter­nal goals and spon­ta­neously ask how their ci­pher might be bro­ken and come up with pro­duc­tive an­gles of at­tack. Other peo­ple wait un­til prompted to look for flaws in their ci­pher, or they de­mand that you ar­gue with them and wait for you to come up with an ar­gu­ment that satis­fies them. If you ask them to pre­dict them­selves what you might sug­gest as a flaw, they say weird things that don’t be­gin to pass your Ide­olog­i­cal Tur­ing Test.

am­ber: You do seem to like your qual­i­ta­tive dis­tinc­tions. Are there bet­ter or worse or­di­nary para­noids? Like, is there a spec­trum in the space be­tween “happy in­no­cent” and “true deep se­cu­rity mind­set”?

coral: One ob­vi­ous quan­ti­ta­tive tal­ent level within or­di­nary para­noia would be in how far you can twist your per­spec­tive to look side­ways at things—the cre­ativity and work­a­bil­ity of the at­tacks you in­vent. Like these ex­am­ples Bruce Sch­neier gave:

Un­cle Mil­ton In­dus­tries has been sel­l­ing ant farms to chil­dren since 1956. Some years ago, I re­mem­ber open­ing one up with a friend. There were no ac­tual ants in­cluded in the box. In­stead, there was a card that you filled in with your ad­dress, and the com­pany would mail you some ants. My friend ex­pressed sur­prise that you could get ants sent to you in the mail.

I replied: “What’s re­ally in­ter­est­ing is that these peo­ple will send a tube of live ants to any­one you tell them to.”

Se­cu­rity re­quires a par­tic­u­lar mind­set. Se­cu­rity pro­fes­sion­als—at least the good ones—see the world differ­ently. They can’t walk into a store with­out notic­ing how they might shoplift. They can’t use a com­puter with­out won­der­ing about the se­cu­rity vuln­er­a­bil­ities. They can’t vote with­out try­ing to figure out how to vote twice. They just can’t help it.

SmartWater is a liquid with a unique iden­ti­fier linked to a par­tic­u­lar owner. “The idea is for me to paint this stuff on my valuables as proof of own­er­ship,” I wrote when I first learned about the idea. “I think a bet­ter idea would be for me to paint it on your valuables, and then call the po­lice.”

Really, we can’t help it.

This kind of think­ing is not nat­u­ral for most peo­ple. It’s not nat­u­ral for en­g­ineers. Good en­g­ineer­ing in­volves think­ing about how things can be made to work; the se­cu­rity mind­set in­volves think­ing about how things can be made to fail…

I’ve of­ten spec­u­lated about how much of this is in­nate, and how much is teach­able. In gen­eral, I think it’s a par­tic­u­lar way of look­ing at the world, and that it’s far eas­ier to teach some­one do­main ex­per­tise—cryp­tog­ra­phy or soft­ware se­cu­rity or safe­crack­ing or doc­u­ment forgery—than it is to teach some­one a se­cu­rity mind­set.

To be clear, the dis­tinc­tion be­tween “just or­di­nary para­noia” and “all of se­cu­rity mind­set” is my own; I think it’s worth di­vid­ing the spec­trum above the happy in­no­cents into two lev­els rather than one, and say, “This busi­ness of look­ing at the world from weird an­gles is only half of what you need to learn, and it’s the eas­ier half.”

am­ber: Maybe Bruce Sch­neier him­self doesn’t grasp what you mean when you say “se­cu­rity mind­set”, and you’ve sim­ply stolen his term to re­fer to a whole new idea of your own!

coral: No, the thing with not want­ing to have to rea­son about whether some­body might some­day reg­ister “donotre­ply.com” and just fix­ing it re­gard­less—a method­ol­ogy that doesn’t trust you to be clever about which prob­lems will blow up—that’s definitely part of what ex­ist­ing se­cu­rity pro­fes­sion­als mean by “se­cu­rity mind­set”, and it’s definitely part of the sec­ond and deeper half. The only un­con­ven­tional thing in my pre­sen­ta­tion is that I’m fac­tor­ing out an in­ter­me­di­ate skill of “or­di­nary para­noia”, where you try to parry an imag­ined at­tack by en­crypt­ing your pass­word file and hid­ing the en­cryp­tion key in a sep­a­rate sec­tion of filesys­tem code. Com­ing up with the idea of hash­ing the pass­word file is, I sus­pect, a qual­i­ta­tively dis­tinct skill, in­vok­ing a world whose di­men­sions are your own rea­son­ing pro­cesses and not just ob­ject-level sys­tems and at­tack­ers. Though it’s not po­lite to say, and the usual sus­pects will in­ter­pret it as a sta­tus grab, my ex­pe­rience with other re­flec­tivity-laden skills sug­gests this may mean that many peo­ple, pos­si­bly in­clud­ing you, will prove un­able to think in this way.

am­ber: I in­deed find that ter­ribly im­po­lite.

coral: It may in­deed be im­po­lite; I don’t deny that. Whether it’s un­true is a differ­ent ques­tion. The rea­son I say it is be­cause, as much as I want or­di­nary para­noids to try to reach up to a deeper level of para­noia, I want them to be aware that it might not prove to be their thing, in which case they should get help and then listen to that help. They shouldn’t as­sume that be­cause they can no­tice the chance to have ants mailed to peo­ple, they can also pick up on the awful­ness of donotre­ply@donotre­ply.com.

am­ber: Maybe you could call that “deep se­cu­rity” to dis­t­in­guish it from what Bruce Sch­neier and other se­cu­rity pro­fes­sion­als call “se­cu­rity mind­set”.

coral: “Se­cu­rity mind­set” equals “or­di­nary para­noia” plus “deep se­cu­rity”? I’m not sure that’s very good ter­minol­ogy, but I won’t mind if you use the term that way.

am­ber: Sup­pose I take that at face value. Ear­lier, you de­scribed what might go wrong when a happy in­no­cent tries and fails to be an or­di­nary para­noid. What hap­pens when an or­di­nary para­noid tries to do some­thing that re­quires the deep se­cu­rity skill?

coral: They be­lieve they have wisely iden­ti­fied bad pass­words as the real fire in need of putting out, and spend all their time writ­ing more and more clever checks for bad pass­words. They are very im­pressed with how much effort they have put into de­tect­ing bad pass­words, and how much con­cern they have shown for sys­tem se­cu­rity. They fall prey to the stan­dard cog­ni­tive bias whose name I can’t re­mem­ber, where peo­ple want to solve a prob­lem us­ing one big effort or a cou­ple of big efforts and then be done and not try any­more, and that’s why peo­ple don’t put up hur­ri­cane shut­ters once they’re finished buy­ing bot­tled wa­ter. Pay them to “try harder”, and they’ll hide seven en­cryp­tion keys to the pass­word file in seven differ­ent places, or build tow­ers higher and higher in places where a suc­cess­ful ad­ver­sary is ob­vi­ously just walk­ing around the tower if they’ve got­ten through at all. What these ideas have in com­mon is that they are in a cer­tain sense “shal­low”. They are men­tally straight­for­ward as at­tempted par­ries against a par­tic­u­lar kind of en­vi­sioned at­tack. They give you a satis­fy­ing sense of fight­ing hard against the imag­ined prob­lem—and then they fail.

am­ber: Are you say­ing it’s not a good idea to check that the user’s pass­word isn’t “pass­word”?

coral: No, shal­low defenses are of­ten good ideas too! But even there, some­body with the higher skill will try to look at things in a more sys­tem­atic way; they know that there are of­ten deeper ways of look­ing at the prob­lem to be found, and they’ll try to find those deep views. For ex­am­ple, it’s ex­tremely im­por­tant that your pass­word checker does not rule out the pass­word “cor­rect horse bat­tery sta­ple” by de­mand­ing the pass­word con­tain at least one up­per­case let­ter, low­er­case let­ter, num­ber, and punc­tu­a­tion mark. What you re­ally want to do is mea­sure pass­word en­tropy. Not en­vi­sion a failure mode of some­body guess­ing “rain­bow”, which you will clev­erly balk by forc­ing the user to make their pass­word be “rA1nbow!” in­stead.

You want the pass­word en­try field to have a check­box that al­lows show­ing the typed pass­word in plain­text, be­cause your at­tempt to parry the imag­ined failure mode of some evil­doer read­ing over the user’s shoulder may get in the way of the user en­ter­ing a long or high-en­tropy pass­word. And the user is perfectly ca­pa­ble of typ­ing their pass­word into that con­ve­nient text field in the ad­dress bar above the web page, so they can copy and paste it—thereby send­ing your pass­word to who­ever tries to do smart lookups on the ad­dress bar. If you’re re­ally that wor­ried about some evil­doer read­ing over some­body’s shoulder, maybe you should be send­ing a con­fir­ma­tion text to their phone, rather than forc­ing the user to en­ter their pass­word into a nearby text field that they can ac­tu­ally read. Ob­scur­ing one text field, with no off-switch for the ob­scu­ra­tion, to guard against this one bad thing that you imag­ined hap­pen­ing, while man­ag­ing to step on your own feet in other ways and not even re­ally guard against the bad thing; that’s the peril of shal­low defenses.

An archety­pal char­ac­ter for “or­di­nary para­noid who thinks he’s try­ing re­ally hard but is ac­tu­ally just piling on a lot of shal­low pre­cau­tions” is Mad-Eye Moody from the Harry Pot­ter se­ries, who has a whole room full of Dark De­tec­tors, and who also ends up locked in the bot­tom of some­body’s trunk. It seems Mad-Eye Moody was too busy buy­ing one more Dark De­tec­tor for his ex­ist­ing room full of Dark De­tec­tors, and he didn’t in­vent pre­cau­tions deep enough and gen­eral enough to cover the un­fore­seen at­tack vec­tor “some­body tries to re­place me us­ing Polyjuice”.

And the solu­tion isn’t to add on a spe­cial anti-Polyjuice po­tion. I mean, if you hap­pen to have one, great, but that’s not where most of your trust in the sys­tem should be com­ing from. The first lines of defense should have a sense about them of depth, of gen­er­al­ity. Hash­ing pass­word files, rather than hid­ing keys; think­ing of how to mea­sure pass­word en­tropy, rather than re­quiring at least one up­per­case char­ac­ter.

am­ber: Again this seems to me more like a quan­ti­ta­tive differ­ence in the clev­er­ness of clever ideas, rather than two differ­ent modes of think­ing.

coral: Real-world cat­e­gories are of­ten fuzzy, but to me these seem like the product of two differ­ent kinds of think­ing. My guess is that the per­son who pop­u­larized de­mand­ing a mix­ture of let­ters, cases, and num­bers was rea­son­ing in a differ­ent way than the per­son who thought of mea­sur­ing pass­word en­tropy. But whether you call the dis­tinc­tion qual­i­ta­tive or quan­ti­ta­tive, the dis­tinc­tion re­mains. Deep and gen­eral ideas—the kind that ac­tu­ally sim­plify and strengthen the ed­ifice of rea­son­ing sup­port­ing the sys­tem’s safety—are in­vented more rarely and by rarer peo­ple. To build a sys­tem that can re­sist or even slow down an at­tack by mul­ti­ple ad­ver­saries, some of whom may be smarter or more ex­pe­rienced than our­selves, re­quires a level of pro­fes­sion­ally spe­cial­ized think­ing that isn’t rea­son­able to ex­pect from ev­ery pro­gram­mer—not even those who can shift their minds to take on the per­spec­tive of a sin­gle equally-smart ad­ver­sary. What you should ask from an or­di­nary para­noid is that they ap­pre­ci­ate that deeper ideas ex­ist, and that they try to learn the stan­dard deeper ideas that are already known; that they know their own skill is not the up­per limit of what’s pos­si­ble, and that they ask a pro­fes­sional to come in and check their rea­son­ing. And then ac­tu­ally listen.

am­ber: But if it’s pos­si­ble for peo­ple to think they have higher skills and be mis­taken, how do you know that you are one of these rare peo­ple who truly has a deep se­cu­rity mind­set? Might your high opinion of your­self just be due to the Dun­ning-Kruger effect?

coral: … Okay, that re­minds me to give an­other cau­tion.

Yes, there will be some in­no­cents who can’t be­lieve that there’s a tal­ent called “para­noia” that they lack, who’ll come up with weird imi­ta­tions of para­noia if you ask them to be more wor­ried about flaws in their brilli­ant en­cryp­tion ideas. There will also be some peo­ple read­ing this with se­vere cases of so­cial anx­iety and un­der­con­fi­dence. Read­ers who are ca­pa­ble of or­di­nary para­noia and even se­cu­rity mind­set, who might not try to de­velop these tal­ents, be­cause they are ter­ribly wor­ried that they might just be one of the peo­ple who only imag­ine them­selves to have tal­ent. Well, if you think you can feel the dis­tinc­tion be­tween deep se­cu­rity ideas and shal­low ones, you should at least try now and then to gen­er­ate your own thoughts that res­onate in you the same way.

am­ber: But won’t that at­ti­tude en­courage over­con­fi­dent peo­ple to think they can be para­noid when they ac­tu­ally can’t be, with the re­sult that they end up too im­pressed with their own rea­son­ing and ideas?

coral: I strongly sus­pect that they’ll do that re­gard­less. You’re not ac­tu­ally pro­mot­ing some kind of col­lec­tive good prac­tice that benefits ev­ery­one, just by per­son­ally agree­ing to be mod­est. The over­con­fi­dent don’t care what you de­cide. And if you’re not just as wor­ried about un­der­es­ti­mat­ing your­self as over­es­ti­mat­ing your­self, if your fears about ex­ceed­ing your proper place are asym­met­ric with your fears about lost po­ten­tial and fore­gone op­por­tu­ni­ties, then you’re prob­a­bly deal­ing with an emo­tional is­sue rather than a strict con­cern with good episte­mol­ogy.

am­ber: If some­body does have the tal­ent for deep se­cu­rity, then, how can they train it?

coral: … That’s a hell of a good ques­tion. Some in­ter­est­ing train­ing meth­ods have been de­vel­oped for or­di­nary para­noia, like classes whose stu­dents have to figure out how to at­tack ev­ery­day sys­tems out­side of a com­puter-sci­ence con­text. One pro­fes­sor gave a test in which one of the ques­tions was “What are the first 100 digits of pi?”—the point be­ing that you need to find some way to cheat in or­der to pass the test. You should train that kind of or­di­nary para­noia first, if you haven’t done that already.

am­ber: And then what? How do you grad­u­ate to deep se­cu­rity from or­di­nary para­noia?

coral: … Try to find more gen­eral defenses in­stead of par­ry­ing par­tic­u­lar at­tacks? Ap­pre­ci­ate the ex­tent to which you’re build­ing ever-taller ver­sions of tow­ers that an ad­ver­sary might just walk around? Ugh, no, that’s too much like or­di­nary para­noia—es­pe­cially if you’re start­ing out with just or­di­nary para­noia. Let me think about this.

Okay, I have a screwy piece of ad­vice that’s prob­a­bly not go­ing to work. Write down the safety-story on which your be­lief in a sys­tem’s se­cu­rity rests. Then ask your­self whether you ac­tu­ally in­cluded all the em­piri­cal as­sump­tions. Then ask your­self whether you ac­tu­ally be­lieve those em­piri­cal as­sump­tions.

am­ber: So, like, if I’m build­ing an op­er­at­ing sys­tem, I write down, “Safety as­sump­tion: The lo­gin sys­tem works to keep out at­tack­ers”—

coral: No!

Uh, no, sorry. As usual, it seems that what I think is “ad­vice” has left out all the im­por­tant parts any­one would need to ac­tu­ally do it.

That’s not what I was try­ing to hand­wave at by say­ing “em­piri­cal as­sump­tion”. You don’t want to as­sume that parts of the sys­tem “suc­ceed” or “fail”—that’s not lan­guage that should ap­pear in what you write down. You want the el­e­ments of the story to be strictly fac­tual, not… value-laden, goal-laden? There shouldn’t be rea­son­ing that ex­plic­itly men­tions what you want to have hap­pen or not hap­pen, just lan­guage neu­trally de­scribing the back­ground facts of the uni­verse. For brain­storm­ing pur­poses you might write down “No­body can guess the pass­word of any user with dan­ger­ous priv­ileges”, but that’s just a proto-state­ment which needs to be re­fined into more ba­sic state­ments.

am­ber: I don’t think I un­der­stood.

coral: “No­body can guess the pass­word” says, “I be­lieve the ad­ver­sary will fail to guess the pass­word.” Why do you be­lieve that?

am­ber: I see, so you want me to re­fine com­plex as­sump­tions into sys­tems of sim­pler as­sump­tions. But if you keep ask­ing “why do you be­lieve that” you’ll even­tu­ally end up back at the Big Bang and the laws of physics. How do I know when to stop?

coral: What you’re try­ing to do is re­duce the story past the point where you talk about a goal-laden event, “the ad­ver­sary fails”, and in­stead talk about neu­tral facts un­der­ly­ing that event. For now, just an­swer me: Why do you be­lieve the ad­ver­sary fails to guess the pass­word?

am­ber: Be­cause the pass­word is too hard to guess.

coral: The phrase “too hard” is goal-laden lan­guage; it’s your own de­sires for the sys­tem that de­ter­mine what is “too hard”. Without us­ing con­cepts or lan­guage that re­fer to what you want, what is a neu­tral, fac­tual de­scrip­tion of what makes a pass­word too hard to guess?

am­ber: The pass­word has high-enough en­tropy that the at­tacker can’t try enough at­tempts to guess it.

coral: We’re mak­ing progress, but again, the term “enough” is goal-laden lan­guage. It’s your own wants and de­sires that de­ter­mine what is “enough”. Can you say some­thing else in­stead of “enough”?

am­ber: The pass­word has suffi­cient en­tropy that—

coral: I don’t mean find a syn­onym for “enough”. I mean, use differ­ent con­cepts that aren’t goal-laden. This will in­volve chang­ing the mean­ing of what you write down.

am­ber: I’m sorry, I guess I’m not good enough at this.

coral: Not yet, any­way. Maybe not ever, but that isn’t known, and you shouldn’t as­sume it based on one failure.

Any­way, what I was hop­ing for was a pair of state­ments like, “I be­lieve ev­ery pass­word has at least 50 bits of en­tropy” and “I be­lieve no at­tacker can make more than a trillion tries to­tal at guess­ing any pass­word”. Where the point of writ­ing “I be­lieve” is to make your­self pause and ques­tion whether you ac­tu­ally be­lieve it.

am­ber: Isn’t say­ing no at­tacker “can” make a trillion tries it­self goal-laden lan­guage?

coral: In­deed, that as­sump­tion might need to be re­fined fur­ther via why-do-I-be­lieve-that into, “I be­lieve the sys­tem re­jects pass­word at­tempts closer than 1 sec­ond to­gether, I be­lieve the at­tacker keeps this up for less than a month, and I be­lieve the at­tacker launches fewer than 300,000 si­mul­ta­neous con­nec­tions.” Where again, the point is that you then look at what you’ve writ­ten and say, “Do I re­ally be­lieve that?” To be clear, some­times the an­swer will be “Yes, I sure do be­lieve that!” This isn’t a so­cial mod­esty ex­er­cise where you show off your abil­ity to have ag­o­niz­ing doubts and then you go ahead and do the same thing any­way. The point is to find out what you be­lieve, or what you’d need to be­lieve, and check that it’s be­liev­able.

am­ber: And this trains a deep se­cu­rity mind­set?

coral: … Maaaybe? I’m wildly guess­ing it might? It may get you to think in terms of sto­ries and rea­son­ing and as­sump­tions alongside pass­words and ad­ver­saries, and that puts your mind into a space that I think is at least part of the skill.

In point of fact, the real rea­son the au­thor is list­ing out this method­ol­ogy is that he’s cur­rently try­ing to do some­thing similar on the prob­lem of al­ign­ing Ar­tifi­cial Gen­eral In­tel­li­gence, and he would like to move past “I be­lieve my AGI won’t want to kill any­one” and into a headspace more like writ­ing down state­ments such as “Although the space of po­ten­tial weight­ings for this re­cur­rent neu­ral net does con­tain weight com­bi­na­tions that would figure out how to kill the pro­gram­mers, I be­lieve that gra­di­ent de­scent on loss func­tion L will only ac­cess a re­sult in­side sub­space Q with prop­er­ties P, and I be­lieve a space with prop­er­ties P does not in­clude any weight com­bi­na­tions that figure out how to kill the pro­gram­mer.”

Though this it­self is not re­ally a re­duced state­ment and still has too much goal-laden lan­guage in it. A re­al­is­tic ex­am­ple would take us right out of the main es­say here. But the au­thor does hope that prac­tic­ing this way of think­ing can help lead peo­ple into build­ing more solid sto­ries about ro­bust sys­tems, if they already have good or­di­nary para­noia and some fairly mys­te­ri­ous in­nate tal­ents.


To be con­tinued in: Se­cu­rity Mind­set and the Lo­gis­tic Suc­cess Curve