[Question] What if memes are common in highly capable minds?

The meme-the­o­retic view of hu­mans says: Memes are to hu­mans as sailors are to ships in the age of sail.

If you want to pre­dict where a ship will go, ask: Is it cur­rently crewed by the French or the English? Is it crewed by mer­chants, pirates, or sol­diers? Th­ese are the most im­por­tant ques­tions.

You can also ask e.g. “Does it have a large cargo hold? Is it swift? Does it have many can­non-ports?” But these ques­tions are less pre­dic­tive of where it will go next. They are use­ful for ex­plain­ing how it got the crew it has, but only to a point—while it’s true that a ship built with a large cargo hold is more likely to be a mer­chant for more of its life, it’s quite com­mon to en­counter a ship with a large cargo hold that is crewed by sol­diers, or for a ship built in France to be sailed by the English, etc. The main de­ter­mi­nants of how a ship got the crew it cur­rently has are its pre­vi­ous in­ter­ac­tions with other crews, e.g. the fights it had, the money that changed hands when it was in port, etc.

The meme-the­o­retic view says: Similarly, the best way to ex­plain hu­man be­hav­ior is by refer­ence to the memes in their head, and the best way to ex­plain how those memes got there is to talk about the his­tory of how those memes evolved in­side the head in re­sponse to other memes they en­coun­tered out­side the head. Non-memetic prop­er­ties of the hu­man (their genes, their nu­tri­tion, their age, etc.) mat­ter, but not as much, just like how the in­ter­nal lay­out of a ship, its size, its age, etc. mat­ter too, but not as much as the sailors in­side it.

Any­how, the meme-the­o­retic view is an in­ter­est­ing con­trast to the highly-ca­pa­ble-agent view. If we ap­ply the meme-the­o­retic view to AI, we get the fol­low­ing vague im­pli­ca­tions:

--Mesa-al­ign­ment prob­lems are se­vere. The pa­per already talks about how there are differ­ent ways a sys­tem could be psuedo-al­igned, e.g. it could have a sta­ble ob­jec­tive that is a proxy of the real ob­jec­tive, or it could have a com­pletely differ­ent ob­jec­tive but be in­stru­men­tally mo­ti­vated to pre­tend, or it could have a com­pletely differ­ent ob­jec­tive but have some ir­ra­tional tic or false be­lief that makes it be­have the way we want for now. Well, on a meme-the­o­retic view these sorts of is­sues are the de­fault, they are the most im­por­tant things for us to be think­ing about.

--There may be no sta­ble ob­jec­tive/​goal at all in the sys­tem. It may have an ob­jec­tive/​goal now, but if the ob­jec­tive is a func­tion of the memes it cur­rently has and the memes can change in hard-to-pre­dict ways based on which other memes it en­coun­ters...

--Train­ing/​evolv­ing an AI to be­have a cer­tain way will be very differ­ent at each stage of smart­ness. When it is too dumb to host any­thing wor­thy of the name meme, it’ll be one thing. When it is smart enough to host sim­ple memes, it’ll be an­other thing. When it is smart enough to host com­plex memes, it’ll be an­other thing en­tirely. Progress and suc­cess made at one level might not carry over to higher lev­els.

--There is a mas­sive train­ing vs. de­ploy­ment prob­lem. The memes our AI en­coun­ters in de­ploy­ment will prob­a­bly be mas­sively differ­ent from those in train­ing, so how do we en­sure that it re­acts to them ap­pro­pri­ately? We have no idea what memes it will en­counter when de­ployed, be­cause we want it go to out into the world and do all sorts of learn­ing and do­ing on our be­half.

Thanks to Abram Dem­ski for read­ing a draft and pro­vid­ing some bet­ter terminology

No answers.