Crypto quant trading: Naive Bayes

Pre­vi­ous post: Crypto quant trad­ing: Intro

I didn’t get re­quests for any spe­cific sub­ject from the last post, so I’m go­ing in the di­rec­tion that I find in­ter­est­ing and I hope the com­mu­nity will find in­ter­est­ing as well. Let’s do Naive Bayes! You can down­load the code and fol­low along.

Just as a re­minder, here’s Bayes’ the­o­rem: P(H|f) = P(H) * P(f|H) /​ P(f). (I’m us­ing f for “fea­ture”.)
Here’s con­di­tional prob­a­bil­ity: P(A|B) = P(A,B) /​ P(B)

Dis­claimer: I was learn­ing Naive Bayes as I was writ­ing this post, so please dou­ble check the math. I’m not us­ing 3rd party libraries so I can fully un­der­stand how it all works. In fact, I’ll start by de­scribing a thing that tripped me up for a bit.

What not to do

My origi­nal un­der­stand­ing was: Naive Bayes ba­si­cally al­lows us to up­date on var­i­ous fea­tures with­out con­cern­ing our­selves with how all of them in­ter­act with each other; we’re just as­sum­ing they are in­de­pen­dent. So we can just ap­ply it iter­a­tively like so:

P(H) = prior
P(H) = P(H) * P(f1|H) /​ P(f1)
P(H) = P(H) * P(f2|H) /​ P(f2)

You can see how that fails if we keep up­dat­ing P(H) up­wards over and over again, un­til it goes above 1. I did math the hard way to figure out where I went wrong. If we have two fea­tures:

P(H|f1,f2) = P(H,f1,f2) /​ P(f1,f2)
= P(f1|H,f2) * P(H,f2) /​ P(f1,f2)
= P(f1|H,f2) * P(f2|H) * P(H) /​ P(f1,f2)
= P(H) * P(f1|H,f2) * P(f2|H) /​ (P(f1|f2) * P(f2))
Then be­cause we as­sume that all fea­tures are in­de­pen­dent:
= P(H) * P(f1|H) * P(f2|H) /​ (P(f1) * P(f2))

Looks like what I wrote above. Where’s the mis­take? Well, Naive Bayes ac­tu­ally says that all fea­tures are in­de­pen­dent, con­di­tional on H. So P(f1|H,f2) = P(f1|H) be­cause we’re con­di­tion­ing on H, but P(f1|f2) != P(f1) be­cause there’s no H in the con­di­tion.

One in­tu­itive ex­am­ple of this is a spam filter. Let’s say all spam emails (H = email is spam) have ran­dom words. So P(word1|word2,H)=P(word1|H), i.e. if we know email is spam, then the pres­ence of any given word doesn’t tell us any­thing about the prob­a­bly of see­ing an­other word. Whereas, P(word1|word2) != P(word1) since there are a lot of non-spam emails, where word ap­pear­ances are very much cor­re­lated. (H/​t to Satvik for this clar­ifi­ca­tion.)

This is ac­tu­ally good news! As­sum­ing P(f1|f2) = P(f1) for all fea­tures would be a pretty big as­sump­tion. But P(f1|H,f2) = P(f1|H), while of­ten not ex­actly true, is a bit less of stretch and, in prac­tice, works pretty well. (This is called con­di­tional in­de­pen­dence.)

Also, in prac­tice, you ac­tu­ally don’t have to com­pute the de­nom­i­na­tor any­way. What you want is the rel­a­tive weight you should as­sign to all the hy­pothe­ses un­der con­sid­er­a­tion. And as long as they are mu­tu­ally ex­clu­sive and col­lec­tively ex­haus­tive, you can just nor­mal­ize your prob­a­bil­ities at the end. So we end up with:

for each H in HS:
    P(H) = prior
    P(H) = P(H) * P(f1|H)
    P(H) = P(H) * P(f2|H)
    etc…
nor­mal­ize all P(H)’s

Which is close to what we had origi­nally, but less wrong.… Okay, now that we know what not to do, let’s get on with the good stuff.

One feature

For now let’s con­sider one very straight for­ward hy­poth­e­sis: the clos­ing price of the next day will be higher than to­day’s (as a short­hand, we’ll call to­mor­row’s bar an “up bar” if that’s the case). And let’s con­sider one very sim­ple fea­ture: was the cur­rent day’s bar up or down?


Note that even though we’re graph­ing only 2017 on­wards, we’re up­dat­ing on all the data prior to that too. Since 2016 and 2017 have been so bullish, we’ve ba­si­cally learned to ex­pect up bars un­der ei­ther con­di­tion. I guess HODLers were right af­ter all.

Us­ing more re­cent data

So, this ap­proach is a bit sub­op­ti­mal if we want to try to catch short term moves (like en­tire 2018). In­stead, let’s try to look at most re­cent data. (Ques­tion: does any­one know of Bayes-like method that weighs re­cent data more?)

We slightly mod­ify our al­gorithm to only look at and up­date on the past N days of data.

It’s in­ter­est­ing to see that it still takes a while for the al­gorithm to catch up to the fact that the bull mar­ket is over. Just in time to not to­tally get crushed by the Novem­ber 2018 drop.
In the note­book I’m also look­ing at shorter terms. There are some in­ter­est­ing re­sults there, but I’m not go­ing to post all the pic­tures here, since that would take too long.

Ad­di­tive smoothing

As we look at shorter and shorter timeframes, we are in­creas­ingly likely to run into a timeframe where there are only up bars (or only down bars) in our his­tory. Then P(up)=1, which doesn’t al­low us to up­date. (Some con­di­tional prob­a­bil­ities get messed up too.) That’s why we had to dis­able the pos­te­rior as­sert in the last code cell. Cur­rently we just don’t trade dur­ing those times, but we could in­stead as­sume that we’ve always seen at least one up and one down bar. (And, like­wise, for all fea­tures.)

The re­sults are not differ­ent for longer timeframes (as we’d ex­pect), and mostly the same for shorter timeframes. We can reen­able our pos­te­rior as­sert too.

Bet sizing

Cur­rently we’re bet­ting our en­tire port­fo­lio each bar. But in the­ory, our bet should prob­a­bly be pro­por­tional to how con­fi­dent we are. You could in the­ory use Kelly crite­rion, but you’d need to have an es­ti­mate of the size of the next bar. So for now we’ll just try lin­ear scal­ing: df[“strat_sig­nal”] = 2 * (df[“P(H_up_bar)”] − 0.5)

We get lower re­turns, but slightly higher SR.

Ig­no­rant priors

Cur­rently we’re com­put­ing the prior for P(next bar is up) by as­sum­ing that it’ll es­sen­tially draw from the same dis­tri­bu­tion as the last N bars. We could also say that we just don’t know! The mar­ket is re­ally clever, and on pri­ors we just shouldn’t as­sume we know any­thing: P(next bar is up) = 50%.

# Com­pute ig­no­rant pri­ors
    for h in hy­pothe­ses:
        df[f”P(H_{h})”] = 1 /​ len(hy­pothe­ses)

Wow, that does sig­nifi­cantly worse. I guess our pri­ors are pretty good.

Put­ting it all to­gether with mul­ti­ple features

Homework

  • Ex­am­ine cur­rent fea­tures? Are they helpful /​ do they work?

  • We’re pre­dict­ing up bars, but what we ul­ti­mately want is re­turns. What as­sump­tions are we mak­ing? What should we con­sider in­stead?

  • Figure out other fea­tures to try.

  • Figure out other cre­ative ways to use Naive Bayes.