[Question] What are concrete examples of potential “lock-in” in AI research?

I had some col­leagues watch Ben Garfinkel’s talk, “How sure are we about this AI stuff?”, which among other things, pointed out that it’s of­ten difficult to change the long-term tra­jec­tory of some tech­nol­ogy. For in­stance, elec­tric­ity, the print­ing press, and agri­cul­ture were all trans­for­ma­tive tech­nolo­gies, but even if we rec­og­nized their im­por­tance in ad­vance, it’s hard to see what we could re­ally change about them in the long-term.

In gen­eral, when I look at tech­nolog­i­cal de­vel­op­ment/​adop­tion, I tend to see peo­ple fol­low­ing lo­cal eco­nomic in­cen­tives wher­ever they lead, and it of­ten seems hard to change these gra­di­ents with­out some se­ri­ous ex­ter­nal pres­sures (force­ful gov­ern­ments, cul­tural taboos, etc.). I don’t see that many “par­allel tracks” where a far­sighted agent could’ve set things on a differ­ent track by pul­ling the right lever at the right time. A coun­terex­am­ple is the Qw­erty vs. Dvo­rak key­board, where some­one with enough in­fluence may well have been able to get so­ciety to adopt the bet­ter key­board from a longter­mist per­spec­tive.

This causes one to look at cases of “lock-in”: times where we could have plau­si­bly taken any one of mul­ti­ple paths, and this de­ci­sion:

a) could have been changed my a rel­a­tively small group of far­sighted agents

b) had sig­nifi­cant effects that lasted decades or more

A lot of the best his­tor­i­cal ex­am­ples of this aren’t tech­nolog­i­cal—the found­ing of ma­jor re­li­gions, the writ­ing of the US con­sti­tu­tion, the Bret­ton Woods agree­ment—which is maybe some small up­date to­wards poli­ti­cal stuff be­ing im­por­tant from a longter­mist per­spec­tive.

But nev­er­the­less, there are ex­am­ples of lock-in for tech­nolog­i­cal de­vel­op­ment. In a group dis­cus­sion af­ter watch­ing Garfinkel’s talk, Lin Eadarm­stadt asked what ex­am­ples of lock-in there might be for AI re­search. I think this is a re­ally good ques­tion, be­cause it may be one de­cent way of lo­cat­ing things we can ac­tu­ally change in the longterm. (Of course, not the only way by any means, but per­haps a fruit­ful one).

After brain­storm­ing this, it felt hard to come up with good ex­am­ples, but here’s two sort-of-ex­am­ples:

Ex­am­ple 1

First, there’s the pro­gram­ming lan­guage that ML is done in. Right now, it’s al­most en­tirely Python. In some not-to­tally-im­plau­si­ble coun­ter­fac­tual, it’s done in OCaml, where the type-check­ing is very strict, and hence cer­tain soft­ware er­rors are less likely to hap­pen. On this met­ric, Python is pretty much the least safe lan­guage for ML.

Of course, even if we agree the OCaml coun­ter­fac­tual is bet­ter in ex­pec­ta­tion, it’s hard to see how any­one could’ve nudged ML to­wards it even in hind­sight. Of course, this would’ve been much eas­ier when ML was a smaller field than it is now, hence we can say Python’s been “locked in”. On the other hand, I’ve heard mur­murs about Swift at­tempt­ing to re­place it, with the lat­ter hav­ing bet­ter-than-zero type safety.

Caveats: I don’t take these “mur­murs” se­ri­ously, it seems very un­likely to me that AGI goes catas­troph­i­cally wrong due to a lack of type safety, and I don’t think it’s worth the time of any­one here to worry about this. This is mostly just a hope­fully illus­tra­tive ex­am­ple.

Ex­am­ple 2

Cur­rently, deep re­in­force­ment learn­ing (DRL) is usu­ally done by spec­i­fy­ing a re­ward func­tion up­front, and hav­ing the agent figure out how to max­i­mize it. As we know, re­ward func­tions are of­ten hard to spec­ify prop­erly in com­plex do­mains, and this is one bot­tle­neck on DRL ca­pa­bil­ities re­search. Still, in my naive think­ing, I can imag­ine a plau­si­ble sce­nario where DRL re­searchers get used to “fudg­ing it”: get­ting agents to sort-of-learn lots of things in a va­ri­ety of rel­a­tively com­plex do­mains where the re­ward func­tions are hacked to­gether by grad stu­dent de­scent, and af­ter many years of hard­ware over­hang have set in, some­one fi­nally figures out a way to stitch these to­gether to get an AGI (or some­thing “close enough” to do some se­ri­ous dam­age).

The main al­ter­na­tives to re­ward speci­fi­ca­tion are imi­ta­tion learn­ing, in­verse RL, and Deep­Mind’s re­ward mod­el­ing (see sec­tion 7 of this pa­per for a use­ful com­par­i­son). In my es­ti­ma­tion, ei­ther of these ap­proaches are prob­a­bly safer than the “AGI via re­ward speci­fi­ca­tion” path.

Of course, these don’t clearly form 4 dis­tinct tech paths, and I rate it > 40% that if AGI largely comes out of DRL, no one tech­nique will claim all the ma­jor mile­stones along the way. So this is a pretty weak ex­am­ple of “lock-in”, be­cause I think, for in­stance, DRL re­searchers will flock to re­ward mod­el­ing if Deep­Mind un­am­bigu­ously demon­strates its su­pe­ri­or­ity over re­ward speci­fi­ca­tion.

Still, I think there is an ex­tent to which re­searchers be­come “com­fortable” with re­search tech­niques, and that if Ten­sorFlow has ex­ten­sive libraries for re­ward speci­fi­ca­tion and ev­ery DRL text­book has a chap­ter “Heuris­tics for Fudg­ing It”, while other tech­niques are viewed as es­o­teric and have start-up costs to ap­ply­ing (and less libraries), then this may be­come a weak form of lock-in.


As I’ve said, those two are fairly weak ex­am­ples. The former is a lock-in that hap­pened a while ago that we prob­a­bly can’t change now, and it doesn’t seem that im­por­tant even if we could. The lat­ter is a fairly weak form of lock-in, in that it can’t with­stand that much in the way of counter-in­cen­tives (com­pare with the Qw­erty key­board).

Still, I found it fun think­ing about these, and I’m cu­ri­ous if peo­ple have any other ideas of po­ten­tial “lock-in” for AI re­search? (Even if it doesn’t have any ob­vi­ous im­pli­ca­tions for safety).

No comments.