# Kalman Filter for Bayesians

Sum­mary: the Kal­man Filter is Bayesian up­dat­ing ap­plied to sys­tems that are chang­ing over time, as­sum­ing all our dis­tri­bu­tions are Gaus­si­ans and all our trans­for­ma­tions are lin­ear.

Pream­ble—the gen­eral Bayesian ap­proach to es­ti­ma­tion: the Kal­man filter is an ap­proach to es­ti­mat­ing mov­ing quan­tities. When I think about a Bayesian ap­proach to es­ti­ma­tion, I think about pass­ing around prob­a­bil­ity dis­tri­bu­tions: we have some dis­tri­bu­tion as our prior, we gather some ev­i­dence, and we have a new dis­tri­bu­tion as our pos­te­rior. In gen­eral, the mean of our dis­tri­bu­tion mea­sures our best guess of the un­der­ly­ing value, and the var­i­ance rep­re­sents our un­cer­tainty.

In the Kal­man filter, the only dis­tri­bu­tion we use is the nor­mal/​Gaus­sian dis­tri­bu­tion. One im­por­tant prop­erty of this is that it can be pa­ram­e­ter­ized com­pletely by the mean and var­i­ance (or co­var­i­ance in the multi-vari­ate case.) If you know those two val­ues, you know ev­ery­thing about the dis­tri­bu­tion.

As a re­sult, peo­ple of­ten talk about the Kal­man filter as though it’s es­ti­mat­ing means and var­i­ances at differ­ent points, but I find it eas­ier to think of it as out­putting a dis­tri­bu­tion rep­re­sent­ing our cur­rent knowl­edge at any point.

The sim­plest case: tak­ing mul­ti­ple mea­sure­ments of a fixed quan­tity with an ac­cu­rate but im­pre­cise sen­sor. For ex­am­ple, say we’re try­ing to mea­sure the tem­per­a­ture with a ther­mome­ter that we be­lieve is ac­cu­rate but has a var­i­ance of 5 de­grees.

We’re very bad at es­ti­mat­ing tem­per­a­tures by hand, so let’s say our prior dis­tri­bu­tion is that the tem­per­a­ture is some­where around 70 de­grees with a var­i­ance of 20, or . We take one read­out from the ther­mome­ter, which (by as­sump­tion) yields a nor­mal dis­tri­bu­tion cen­tered around the true tem­per­a­ture with var­i­ance 5: N(t, 5). The ther­mome­ter reads 78. What’s our new es­ti­mate?

Well, it turns out there’s a sim­ple rule for com­bin­ing Nor­mal dis­tri­bu­tions with known var­i­ance: if our prior is and our ob­ser­va­tion is then the pos­te­rior has mean

(1)

(2) , where

(3) is called the Kal­man gain.

So if our first read­ing is 72, then is , , and . If we take an­other read­ing, we’d ap­ply the same set of calcu­la­tions, ex­cept our prior would be .

Some in­tu­ition: let’s look at the Kal­man gain. First, note that its value is always be­tween 0 or 1. Se­cond, note that the gain is close to 0 if is large com­pared to , and close to 1 in the op­po­site case. In­tu­itively, we can think of the Kal­man gain as a ra­tio of how much we trust our new ob­ser­va­tion rel­a­tive to our prior, where the var­i­ances are a mea­sure of un­cer­tainty.

What hap­pens to the mean? It moves along the line from our prior mean to the ob­ser­va­tion. If we trust the ob­ser­va­tion a lot, is nearly 1, and we move al­most all the way. If we trust the prior much more than the ob­ser­va­tion, we ad­just our es­ti­mate very lit­tle. And if we trust them equally, we take the av­er­age of the two.

Also note that the var­i­ance always goes down. Once again, if we trust the new in­for­ma­tion a lot, the var­i­ance goes down a bunch. If we trust the new in­for­ma­tion and our prior equally, then the var­i­ance is halved.

Fi­nally, as a last tid­bit, it doesn’t mat­ter whether which dis­tri­bu­tion is the prior and which is the ob­ser­va­tion in this case—we’ll get ex­actly the same pos­te­rior if we switch them around.

Ad­ding a sen­sor: none of the math above as­sumes we’re always us­ing the same sen­sor. As long as we as­sume all our sen­sors draw from dis­tri­bu­tions cen­tered around the true mean and with a known (or es­ti­mated) var­i­ance, we can up­date on ob­ser­va­tions from any num­ber of sen­sors, us­ing the same up­date rule.

Mea­sur­ing mul­ti­ple quan­tities: what if we want to mea­sure two or more quan­tities, such as tem­per­a­ture and hu­midity? Then we now have mul­ti­vari­ate nor­mal dis­tri­bu­tions. While a sin­gle-vari­able Gaus­sian is pa­ram­e­ter­ized by its mean and var­i­ance, an -vari­able Gaus­sian is pa­ram­e­ter­ized by a vec­tor of means and an co­var­i­ance ma­trix: .

Our up­date equa­tions are the mul­ti­vari­ate ver­sions of the equa­tions above: given a prior dis­tri­bu­tion and a mea­sure­ment from a sen­sor with co­var­i­ance ma­trix , our pos­te­rior dis­tri­bu­tion is with:

(4)

(5)

(6)

Th­ese are ba­si­cally just the ma­trix ver­sions of equa­tions (1), (2), and (3).

Ad­ding pre­dictable change over time: so far, we’ve cov­ered Bayesian up­dates when you’re mak­ing mul­ti­ple mea­sure­ments of some static set of quan­tities. But what about when things are chang­ing? A clas­sic ex­am­ple is a mov­ing car. For this case, let’s as­sume we’re mea­sur­ing two quan­tities – po­si­tion and ve­loc­ity.

For a bit more de­tail, say at time our vec­tor where is the po­si­tion and is ve­loc­ity. Then at time , we might ex­pect the po­si­tion to be , and the ve­loc­ity to be the same on av­er­age. We can rep­re­sent this with a ma­trix: , where is the ma­trix .

More gen­er­ally, say our be­lief at time is . Then our be­lief at time , be­fore we make any new ob­ser­va­tions, should be . For­tu­nately there’s a sim­ple for­mula for this: .

Put­ting it all to­gether, say our be­lief at time is , and at time we mea­sure a value from a sen­sor with co­var­i­ance ma­trix , then we perform the Bayesian up­date with as the prior and as the pos­te­rior:

(7)

(8)

(9)

And that’s the main idea! We just ad­just our prior by ap­ply­ing a tran­si­tion func­tion/​ma­trix to it first In prac­tice, the Kal­man filter tends to quickly con­verge to true val­ues, and is widely used in ap­pli­ca­tions such as GPS track­ing.

• var­i­ance of 5 degrees

Nit­pick: Units of var­i­ance would be 5 de­grees^2.

Also, per­son­ally I find stan­dard de­vi­a­tion eas­ier to think about, and ini­tially thought that you were ac­ci­den­tally call­ing the stan­dard de­vi­a­tion the var­i­ance, though for Kal­man filters var­i­ance does seem more use­ful.

• Thanks! Edited. Yeah, I speci­fi­cally fo­cused on var­i­ance be­cause of how Bayesian up­dates com­bine Nor­mal dis­tri­bu­tions.

• Good post!

Is it com­mon to use Kal­man filters for things that have non­lin­ear trans­for­ma­tions, by ap­prox­i­mat­ing the pos­te­rior with a Gaus­sian (eg. calcu­lat­ing the clos­est Gaus­sian dis­tri­bu­tion to the true pos­te­rior by JS-di­ver­gence or the like)? How well would that work?

Gram­mar com­ment—you seem to have ac­ci­den­tally a few words at

Mea­sur­ing mul­ti­ple quan­tities: what if we want to mea­sure two or more quan­tities, such as tem­per­a­ture and hu­midity? Fur­ther­more, we might know that these are [miss­ing words?] Then we now have mul­ti­vari­ate nor­mal dis­tri­bu­tions.

• Thanks! Edited.

• There are a num­ber of Kal­man-like things you can do when your up­dates are non­lin­ear.

The “ex­tended Kal­man filter” uses a lo­cal lin­ear ap­prox­i­ma­tion to the up­date. There are higher-or­der ver­sions. The EKF un­sur­pris­ingly tends to do badly when the up­date is sub­stan­tially non­lin­ear. The “un­scented Kal­man filter” uses (kinda) a finite-differ­ence ap­prox­i­ma­tion in­stead of the deriva­tive, de­liber­ately tak­ing points that aren’t su­per-close to­gether to get an ap­prox­i­ma­tion that’s mean­ingful on the scale of your ac­tual un­cer­tainty. Go­ing fur­ther in that di­rec­tion you get “par­ti­cle filters” which rep­re­sent your un­cer­tainty not as a Gaus­sian but by a big pile of sam­ples from its dis­tri­bu­tion. (There’s a ton of lore on all this stuff. I am in no way an ex­pert on it.)

• Very neat tool, thanks for the con­cise­ness of the ex­pla­na­tion. Though I hope I won’t have to mea­sure 70° tem­per­a­tures by hand any time soon. (I know, I know, it’s in Fahren­heit, but it still sounds… dis­so­nant ? to my eu­ro­pean ears)