The Univariate Fallacy

(A stan­dalone math post that I want to be able to link back to later/​el­se­where)

There’s this statis­ti­cal phe­nomenon where it’s pos­si­ble for two mul­ti­vari­ate dis­tri­bu­tions to over­lap along any one vari­able, but be cleanly sep­a­rable when you look at the en­tire con­figu­ra­tion space at once. This is per­haps eas­iest to see with an illus­tra­tive di­a­gram—

The de­nial of this pos­si­bil­ity (in ar­gu­ments of the form, “the dis­tri­bu­tions over­lap along this vari­able, there­fore you can’t say that they’re differ­ent”) is some­times called the “uni­vari­ate fal­lacy.” (Eliezer Yud­kowsky pro­poses “co­var­i­ance de­nial fal­lacy” or “cluster era­sure fal­lacy” as po­ten­tial al­ter­na­tive names.)

Let’s make this more con­crete by mak­ing up an ex­am­ple with ac­tual num­bers in­stead of just a pretty di­a­gram. Imag­ine we have some dat­a­points that live in the forty-di­men­sional space {1, 2, 3, 4}⁴⁰ that are sam­pled from one of two prob­a­bil­ity dis­ti­bu­tions, which we’ll call and .

For sim­plic­ity, let’s sup­pose that the in­di­vi­d­ual vari­ables x₁, x₂, … x₄₀—the coör­di­nates of a point in our forty-di­men­sional space—are statis­ti­cally in­de­pen­dent and iden­ti­cally dis­tributed. For ev­ery in­di­vi­d­ual , the marginal dis­tri­bu­tion of is—

And for

If you look at any one -coör­di­nate for a point, you can’t be con­fi­dent which dis­tri­bu­tion the point was sam­pled from. For ex­am­ple, see­ing that x₁ takes the value 2 gives you a 74 (= 1.75) like­li­hood ra­tio in fa­vor of that the point hav­ing been sam­pled from rather than , which is log₂(7/​4) ≈ 0.807 bits of ev­i­dence.

That’s … not a whole lot of ev­i­dence. If you guessed that the dat­a­point came from based on that much ev­i­dence, you’d be wrong about 4 times out of 10. (Given equal (1:1) prior odds, an odds ra­tio of 7:4 amounts to a prob­a­bil­ity of (7/​4)/​(1 + 74) ≈ 0.636.)

And yet if we look at many vari­ables, we can achieve supreme, godlike con­fi­dence about which dis­tri­bu­tion a point was sam­pled from. Prov­ing this is left as an ex­er­cise to the par­tic­u­larly in­trepid reader, but a con­crete demon­stra­tion is prob­a­bly sim­pler and should be pretty con­vinc­ing! Let’s write some Python code to sam­ple a point ∈ {1, 2, 3, 4}⁴⁰ from

im­port ran­dom

def a():
    re­turn ran­dom.sam­ple(
        [1]*4 +  # 1/​4
        [2]*7 +  # 7/​16
        [3]*4 +  # 1/​4
        [4],     # 1/​16
        1
    )[0]

x = [a() for _ in range(40)]
print(x)

Go ahead and run the code your­self. (With an on­line REPL if you don’t have Python in­stalled lo­cally.) You’ll prob­a­bly get a value of x that “looks some­thing like”

[2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 4, 4, 2, 2, 3, 3, 1, 2, 2, 2, 4, 2, 2, 1, 2, 1, 4, 3, 3, 2, 1, 1, 3, 3, 2, 2, 3, 3, 4]

If some­one off the street just handed you this with­out tel­ling you whether she got it from or , how would you com­pute the prob­a­bil­ity that it came from ?

Well, be­cause the coör­di­nates/​vari­ables are statis­ti­cally in­de­pen­dent, you can just tally up (mul­ti­ply) the in­di­vi­d­ual like­li­hood ra­tios from each vari­able. That’s only a lit­tle bit more code—

im­port log­ging

log­ging.ba­sicCon­fig(level=log­ging.INFO)

def odds_to_prob­a­bil­ity(o):
    re­turn o/​(1+o)

def tally_like­li­hoods(x, p_a, p_b):
    to­tal_odds = 1
    for i, x_i in enu­mer­ate(x, start=1):
        lr = p_a[x_i-1]/​p_b[x_i-1]  # (-1s be­cause of zero-based ar­ray in­dex­ing)
        log­ging.info(“x_%s = %s, like­li­hood ra­tio is %s”, i, x_i, lr)
        to­tal_odds *= lr
    re­turn to­tal_odds

print(
    odds_to_prob­a­bil­ity(
        tally_like­li­hoods(
            x,
            [1/​​4, 7/​​16, 1/​​4, 1/​​16],
            [1/​​16, 1/​​4, 7/​​16, 1/​​4]
        )
    )
)

If you run that code, you’ll prob­a­bly see “some­thing like” this—

INFO:root:x_1 = 2, like­li­hood ra­tio is 1.75
INFO:root:x_2 = 1, like­li­hood ra­tio is 4.0
INFO:root:x_3 = 2, like­li­hood ra­tio is 1.75
INFO:root:x_4 = 2, like­li­hood ra­tio is 1.75
INFO:root:x_5 = 1, like­li­hood ra­tio is 4.0
[blah blah, redact­ing some lines to save ver­ti­cal space in the blog post, blah blah]
INFO:root:x_37 = 2, like­li­hood ra­tio is 1.75
INFO:root:x_38 = 3, like­li­hood ra­tio is 0.5714285714285714
INFO:root:x_39 = 3, like­li­hood ra­tio is 0.5714285714285714
INFO:root:x_40 = 4, like­li­hood ra­tio is 0.25
0.9999936561215961

Our com­puted prob­a­bil­ity that came from has sev­eral nines in it. Wow! That’s pretty con­fi­dent!

Thanks for read­ing!