Do the best ideas float to the top?

It may de­pend on what we mean by “best”.

Epistemic sta­tus: I un­der­stand very lit­tle of any­thing.

Spec­u­la­tion about po­ten­tial ap­pli­ca­tions: reg­u­lat­ing a log­i­cal pre­dic­tion mar­ket, e.g. log­i­cal in­duc­tion; con­struct­ing judges or com­peti­tors in e.g. al­ign­ment by de­bate; de­sign­ing com­mu­ni­ca­tion tech­nol­ogy, e.g. to miti­gate harms and risks of in­for­ma­tion war­fare.

The slo­gan “the best ideas float to the top” is of­ten used in so­cial con­texts. The say­ing goes, “in a free mar­ket of ideas, the best ideas float to the top”. Of course, it is not in­tended as a facts state­ment, as in “we have ob­served that this is the case”; it is in­stead a val­ues state­ment, as in “we would pre­fer for this to be the case.”.

In this es­say, how­ever, we will force an em­piri­cal in­ter­pre­ta­tion, just to see what hap­pens. I will provide three ways to con­sider the den­sity of an idea, or the num­ber as­signed to how float-to-the-top an idea is.

In brief, an idea is a sen­tence, and you can vary the amount of it’s an­tecedent graph (like in bayesian nets, NARS-like ar­chi­tec­tures) or func­tion out of which it is printed (like in com­pres­sion) that you want to con­sider at a given mo­ment, up to re­source al­lo­ca­tion. This isn’t an en­tirely math­e­mat­i­cal pa­per, so don’t worry about WFFs, parsers, etc., which is why i’ll stick with “ideas” in­stead of “sen­tences”. I will also be hand­wav­ing be­tween “de­scrip­tion of some world states” and “be­lief about how world states re­late to ea­chother”.


Sup­pose you ob­serve wear­ers of teal hats ad­vo­cate for policy A, but you don’t know what A is. You’re mind­ing your busi­ness in an ap­ple­bees park­ing lot when a wearer of ma­genta hats gets your at­ten­tion to tell you “A is harm­ful”. There are two cases:

  1. Sup­pose A is “kick­ing pup­pies”, (and I don’t mean the wearer of ma­genta hats is mis­lead­ingly com­press­ing A to you, I mean the policy is liter­ally kick­ing pup­pies). The in­fer­en­tial gap be­tween you and the ma­gen­tas can be closed very cheaply, so you’re quickly con­vinced that A is harm­ful (un­less you be­lieve that kick­ing pup­pies is good).

  2. Sup­pose A is “flee­gan at a rate of flar­gen”, where fleega­nomics is a niche tech­ni­cal sub­ject which nev­er­the­less can be learned by any­one of me­dian ed­u­ca­tion in N units[^1] or less. Sup­pose also that you know the value of N, but you’re not in­clined to in­vest that much com­pute in a dumb elec­tion, so you ei­ther a. take them at their word that A is harm­ful; b. search the ap­ple­bees for an au­thor­ity figure who be­lieves that A is harm­ful, but be­lieves it more cred­ibly; or c. leave the park­ing lot with­out up­dat­ing in any di­rec­tion.

“That’s easy, c” you re­spond, blind­ingly fast. You peel out of there, and the whole af­fair makes not a dent in your epistemic hy­giene. But you left be­hind many oth­ers. Will they be as strong, as wise as you?

“In an in­for­ma­tion-rich world, the wealth of in­for­ma­tion means a dearth of some­thing else: a scarcity of what­ever it is that in­for­ma­tion con­sumes. What in­for­ma­tion con­sumes is rather ob­vi­ous: it con­sumes the at­ten­tion of its re­cip­i­ents. Hence a wealth of in­for­ma­tion cre­ates a poverty of at­ten­tion and a need to al­lo­cate that at­ten­tion effi­ciently among the over­abun­dance of in­for­ma­tion sources that might con­sume it.”

– Her­bert Simon

Let’s call case 1 “con­stant” and case 2 “lin­ear”. We as­sume that con­stant refers to neg­ligible cost, and that lin­ear is in ped­a­gog­i­cal length (where ped­a­gog­i­cal cost is some mea­sure of the re­sources needed to ac­quire some sort of un­der­stand­ing).

A reg­u­la­tor, un­like you, isn’t will­ing to leave any­one be­hind for evan­ge­lists and pun­dits to prey on. This is the role I’m as­sum­ing for this es­say. I will ul­ti­mately pro­pose a nega­tive at­ten­tional tax, in which the con­stant cases would be pe­nal­ized to give the lin­ear cases a boost. (It’s like nega­tive in­come tax, re­plac­ing money with at­ten­tion).

If you could un­der­stand fleega­nomics in N/​100000 bits, would it be worth it to you then?

Let’s force an em­piri­cal in­ter­pre­ta­tion of “the best ideas float to the top”

Three pos­si­ble mea­sures of den­sity:

  1. the sim­plest ideas float to the top.

  2. the truest ideas float to the top.

  3. the ideas which ad­vance the best val­ues float to the top, where by “ad­vance the best val­ues” we mean ei­ther a. max­i­mize my util­ity func­tion, not yours; or b. max­i­mize the ag­gre­gate/​av­er­age util­ity func­tion of all moral pa­tients, with­out em­pha­sis on zero-sum face­offs be­tween op­po­nent util­ity func­tions.

Each in turn im­plies a sort of world in which it is the sole in­ter­pre­ta­tion, and thus the sole fac­tor over be­liefs of truth-seek­ers.

The in­tu­ition given above leans heav­ily on den­sity_3, how­ever, we must start much lower, at the fun­da­men­tals of sim­plic­ity and truth. From now on, for brevity’s sake, please ig­nore den­sity_3 and fo­cus on the first two.

den­sity_1: The Sim­plest Ideas Float to the Top.

If you form a heuris­tic by philoso­phiz­ing the con­junc­tion rule in prob­a­bil­ity the­ory, you get Oc­cam’s ra­zor. In ma­chine learn­ing, we have model se­lec­tion meth­ods that di­rectly pe­nal­ize com­plex­ity. Oc­cam’s ra­zor doesn’t say any­thing about re­cep­tion of ideas in a so­cial sys­tem, be­yond im­ply­ing that in gam­bling the wise bet on shorter sen­tences (in­so­far as the wise are gam­blers).

If we as­sume that the wearer of ma­genta hats is max­i­miz­ing some­thing like pe­ti­tion sig­na­tures, and by proxy max­i­miz­ing the num­ber of ap­ple­bees pa­trons con­verted to ma­genta hat wear­ing ap­ple­bees pa­trons, then in the world of den­sity_1 they ought to per­suade only via state­ments with con­stant or neg­ligible cost. (re­mem­ber, in the world of den­sity_1, state­ment’s needn’t have any par­tic­u­lar con­tent to be suc­cess­ful. In an ideal­ized set­ting, this would mean the empty string gets 100% of the vote in ev­ery elec­tion, or 100% of traders pur­chase noth­ing but the empty string, etc.; in a hu­man set­ting, think of the “small­est rec­og­niz­able be­lief”).

den­sity_2: The Truest Ideas Float to the Top.

If the truest ideas floated to the top, then state­ments with more sub­stan­tial truth val­ues (i.e. with more ev­i­dence, more com­pel­ling ev­i­dence, stronger in­fer­en­tial steps) win out against those with less sub­stan­tial truth val­ues. In a world gov­erned only by den­sity_2, all cost is neg­ligible.

In this world, the wearer of ma­genta hats is in­cen­tivized to teach fleaganomics – to bother them­selves (and oth­ers) with lin­ear cost ideas – if that’s what leads peo­ple to more sub­stan­tially held be­liefs or com­mit­ments. This is a sort of or­a­cle world, in a word, log­i­cal om­ni­science.

In a mar­ket view, truth only pre­vails in the long run (i.e. in the same way that price only con­verges to value but you can’t pin­point when they’re equal, sup­ply with de­mand, etc.), which is why the den­sity_2 in­ter­pre­ta­tion is suit­able for or­a­cles, or at least the in­finite re­sources of AIXI-like agents. If you tried to pop­u­late the world of den­sity_2 with log­i­cally un­cer­tain/​AIKR-abid­ing agents, the en­tire ap­peal of mar­kets evap­o­rates. “Those who know they are not rele­vant ex­perts shut up, and those who do not know this even­tu­ally lose their money, and then shut up.” (Han­son), but with­out the “even­tu­ally”.

Nega­tive at­ten­tion tax

Now sup­pose we live in some world where den­sity_1 and den­sity_2 are op­er­at­ing at the same time, with some fog­gier and hand­wavier things like den­sity_3 on the mar­gins. In such a world, we say false-com­pli­cated ideas are ro­bustly un­com­pet­i­tive and true-sim­ple ideas are ro­bustly com­pet­i­tive, where “ro­bust” means “re­silient to foul play”, and “foul play” means “any mis­lead­ing com­pres­sion, fal­la­cious rea­son­ing, etc.”. Without such re­silience, we have risk that false-sim­ple ideas will suc­ceed and true-com­pli­cated ideas will fail.

A reg­u­la­tor isn’t will­ing to leave any­one be­hind for evan­ge­lists and pun­dits to prey on.

Per­haps we want free at­ten­tion dis­tributed to true-but-com­pli­cated things, and penalties ap­plied to false-but-sim­ple things. In eco­nomics, a nega­tive in­come tax (NIT) is a welfare sys­tem within an in­come tax where peo­ple earn­ing be­low a cer­tain amount re­ceive sup­ple­men­tal pay from the gov­ern­ment in­stead of pay­ing taxes to the gov­ern­ment.

For us, a nega­tive at­ten­tional tax is a welfare sys­tem, where ideas de­mand­ing above a cer­tain amount of com­pute re­ceive sup­ple­men­tal at­ten­tion, and ideas be­low that amount pay up.

den­sity_2 \ den­sity_1 Sim­ple Com­pli­cated
False I’m say­ing this is a failure mode, dan­ger zone, etc. Ro­bustly un­com­pet­i­tive (won’t bother us)
True Ro­bustly com­pet­i­tive (these will be fine) I’m say­ing the solu­tion is to give these sen­tences a boost.

An ex­am­ple im­ple­men­ta­tion: sup­pose I’m work­ing at nose­book in year of our lord. When I no­tice cer­tain posts get liked/​shared blind­ingly fast, and oth­ers take more time, I sup­pose that the sim­ple ones are some form of epistemic foul play, and the com­pli­cated ones are more likely to al­ign with epistemic norms we pre­fer. I make an al­gorithm to sup­press posts that get liked/​shared too quickly, and re­place their spots in the feed with posts that seem to be di­gested be­fore get­ting liked/​shared (dis­claimer: this is not a re­silient pro­posal, I spent all of 10 sec­onds think­ing about it, please defer to your near­est mis­in­for­ma­tion ex­pert)

In­di­vi­d­u­als ap­ply NAT cred­its to in­ter­est­ing-look­ing com­pli­cated ideas, com­pli­cated ideas aren’t di­rectly sup­plied with these sup­ple­ments in the way that sim­ple ideas are au­to­mat­i­cally hand­i­capped.

Though the above may be a valid in­ter­pre­ta­tion, es­pe­cially in the nose­book ex­am­ple, NAT is more prop­erly un­der­stood as cred­its al­lo­cated to in­di­vi­d­u­als for them to spend freely.

You can imag­ine the stump speech.

ex­tremely cam­paign­ing voice: I’m go­ing to make sure ev­ery mem­ber of ev­ery ap­ple­bees park­ing lot has a length­ened/​hand­i­capped men­tal speed when they’re faced with sim­ple ideas, and this will come back to them as tax cred­its they can spend on com­pli­cated ideas. Every ap­ple­bees pa­tron de­serves com­plex­ity, even if they can’t af­ford the full com­pute/​price for it.

--foot­note-- [^1]: “Ped­a­gog­i­cal cost” is loosely in­spired by “al­gorith­mic de­com­po­si­tion” in Between Say­ing and Do­ing. TLDR., to rea­son about a stu­dent ac­quiring long di­vi­sion, we rea­son about their ac­qui­si­tion of sub­trac­tion and mul­ti­pli­ca­tion. For us, ped­a­gog­i­cal cost or length of some ca­pac­ity is the sum of the length of its pre­req­ui­site ca­pac­i­ties. We’ll con­sider our ped­a­gog­i­cal units as some func­tion on at­ten­tional units. Her­bert Si­mon dis­misses adopt­ing Shan­non’s bit as the at­ten­tional unit, be­cause he wants some­thing in­var­i­ant un­der differ­ent en­cod­ing choices. He goes on to sug­gest time in the form of “how long it takes for the me­dian hu­man cog­ni­tion to di­gest”. This can be our base unit of pars­ing things you already know how to parse, even though ex­tend­ing it to ped­a­gog­i­cal cost wouldn’t be as sta­ble be­cause we don’t un­der­stand teach­ing or learn­ing very well.