SatvikBeri

Karma: 981
Page 1
• For the or­thog­o­nal de­com­po­si­tion, don’t you need two scalars? E.g. . For ex­am­ple, in , let Then , and there’s no way to write as

• My fa­vorite book, by far, is Func­tional Pro­gram­ming in Scala. This book has you de­rive most of the con­cepts from scratch, to the point where even com­plex ab­strac­tions feel like ob­vi­ous con­se­quences of things you’ve already built.

If you want some­thing more Haskell-fo­cused, a good choice is Pro­gram­ming in Haskell.

• I didn’t down­vote, but I agree that this is a sub­op­ti­mal meme – though the pre­vailing mind­set of “al­most no­body can learn Calcu­lus” is much worse.

As a dat­a­point, it took me about two weeks of ob­ses­sive, 15 hour/​day study to learn Calcu­lus to a point where I tested out of the first two courses when I was 16. And I think it’s fair to say I was un­usu­ally tal­ented and un­usu­ally mo­ti­vated. I would not ex­pect the vast ma­jor­ity of peo­ple to be able to grok Calcu­lus within a week, though ob­vi­ously peo­ple on this site are not a rep­re­sen­ta­tive sam­ple.

• Yes, roughly speak­ing, if you mul­ti­ply the VC di­men­sion by n, then you need n times as much train­ing data to achieve the same perfor­mance. (More pre­cise state­ment here: https://​​en.wikipe­dia.org/​​wiki/​​Vap­nik%E2%80%93Cher­vo­nenkis_di­men­sion#Uses) There are also a few other bounds you can get based on VC di­men­sion. In prac­tice these bounds are way too large to be use­ful, but an al­gorithm with much higher VC di­men­sion will gen­er­ally overfit more.

• A differ­ent view is to look at the search pro­cess for the mod­els, rather than the model it­self. If model A is found from a pro­cess that eval­u­ates 10 mod­els, and model B is found from a pro­cess that eval­u­ates 10,000, and they oth­er­wise have similar re­sults, then A is much more likely to gen­er­al­ize to new data points than B.

The for­mal­iza­tion of this con­cept is called VC di­men­sion and is a big part of Ma­chine Learn­ing The­ory (al­though ar­guably it hasn’t been very helpful in prac­tice): https://​​en.wikipe­dia.org/​​wiki/​​Vap­nik%E2%80%93Cher­vo­nenkis_dimension

• It’s a com­bi­na­tion. The point is to throw out al­gorithms/​pa­ram­e­ters that do well on back­tests when the as­sump­tions are vi­o­lated, be­cause those are much more likely to be overfit.

• As an ex­am­ple, con­sider a strat­egy like “on Wed­nes­days, the mar­ket is more likely to have a large move, and sig­nal XYZ pre­dicts big moves ac­cu­rately.” You can en­code that as an al­gorithm: trade sig­nal XYZ on Wed­nes­days. But the al­gorithm might make money on back­tests even if the as­sump­tions are wrong! By ex­am­in­ing the in­di­vi­d­ual com­po­nents rather than just whether the al­gorithm made money, we get a bet­ter idea of whether the strat­egy works.

• Yes, avoid­ing overfit­ting is the key prob­lem, and you should ex­pect al­most any­thing to be overfit by de­fault. We spend a lot of time on this (I work w/​Alexei). I’m think­ing of writ­ing a longer post on pre­vent­ing overfit­ting, but these are some key parts:

• The­ory. Some­thing that makes eco­nomic sense, or has worked in other mar­kets, is more likely to work here

• Com­po­nents. A strat­egy made of 4 com­po­nents, each of which can be in­de­pen­dently val­i­dated, is a lot more likely to keep work­ing than one black box

• Mea­sur­ing strat­egy com­plex­ity. If you ex­plore 1,000 pos­si­ble pa­ram­e­ter com­bi­na­tions, that’s less likely to work than if you ex­plore 10.

• Al­gorith­mic de­ci­sion mak­ing. Any man­ual part of the pro­cess in­tro­duces a lot of pos­si­bil­ities for overfit.

• Ab­strac­tion & reuse. The more you reuse things, the fewer de­grees of free­dom you have with each idea, and there­fore the lower your chance of overfit­ting.

• I think the prior that bub­bles usu­ally pop is in­cor­rect. We tend to call some­thing a bub­ble in ret­ro­spect, af­ter it’s popped.

But if you try to define bub­bles with purely for­ward-look­ing mea­sures, like a pe­riod of un­usu­ally high growth, they’re more fre­quently fol­lowed by pe­ri­ods of un­usu­ally slow growth, not rapid de­cline. For ex­am­ple, Ama­zon’s stock would pass just about any test of a bub­ble over most points in its his­tory.

I ex­pect some­thing similar with ed­u­ca­tion, spend­ing will likely re­main high, but grow more slowly than it did in the last 20 years. That’s es­pe­cially true be­cause of the struc­ture of stu­dent loans, peo­ple can’t re­ally just de­fault.

But to an­swer the more di­rect ques­tion: as­sum­ing that there is a rapid drop in ed­u­ca­tion spend­ing, how could we profit from it? Vo­ca­tional schools seem like the most ob­vi­ous bet, e.g. to be­come a pro­gram­mer, den­tal as­sis­tant, mas­sage ther­a­pist, elec­tri­cian, and so on.

Cer­tifi­ca­tion ser­vices that man­age to de­velop a rep­u­ta­tion will be­come strong as well, e.g. SalesForce cer­tifi­cates are pretty valuable.

You could di­rectly short lenders such as Sal­lie Mae.

Re­cruit­ment agen­cies that spe­cial­ize in plac­ing re­cent col­lege grad­u­ates will likely suffer.

Man­age­ment con­sult­ing firms rely heav­ily on col­lege grad­u­ates, and so do hedge funds to a lesser ex­tent.

• #6:

As­sume WLOG Then by mono­ton­ic­ity, we have If this chain were all strictly greater, than we would have istinct el­e­ments. Thus there must be some uch that By in­duc­tion, or all

#7:

As­sume nd con­struct a chain similarly to (6), in­dexed by el­e­ments of If all in­equal­ities were strict, we would have an in­jec­tion from o L.

#8:

Let F be the set of fixed points. Any sub­set S of F must have a least up­per bound n L. If x is a fixed point, done. Other­wise, con­sider which must be a fixed point by (7). For any q in S, we have Thus s an up­per bound of S in F. To see that it is the least up­per bound, as­sume we have some other up­per bound b of S in F. Then

To get the lower bound, note that we can flip the in­equal­ities in L and still have a com­plete lat­tice.

#9:

P(A) clearly forms a lat­tice where the up­per bound of any set of sub­sets is their union, and the lower bound is the in­ter­sec­tion.

To see that in­jec­tions are mono­tonic, as­sume nd s an in­jec­tion. For any func­tion, If nd that im­plies or some which is im­pos­si­ble since s in­jec­tive. Thus s (strictly) mono­tonic.

Now s an in­jec­tion Let e the set of all points not in the image of and let ote that since no el­e­ment of s in the image of Then On one hand, ev­ery el­e­ment of A not con­tained in s in y con­struc­tion, so On the other, clearly so QED.

#10:

We form two bi­jec­tions us­ing the sets from (9), one be­tween A’ and B’, the other be­tween A—A’ and B—B’.

Any in­jec­tion is a bi­jec­tion be­tween its do­main and image. Since nd s an in­jec­tion, s a bi­jec­tion where we can as­sign each el­e­ment o the uch that Similarly, s a bi­jec­tion be­tween nd Com­bin­ing them, we get a bi­jec­tion on the full sets.

Kal­man Filter for Bayesians

22 Oct 2018 17:06 UTC
57 points

Sys­tem­iz­ing and Hacking

23 Mar 2018 18:01 UTC
104 points

In­fer­ence & Empiricism

20 Mar 2018 15:47 UTC
87 points
• I’m speci­fi­cally giv­ing up games that en­courage many short check-ins, e.g. most phone games and idle games. Binges aren’t a big is­sue for me, they tend to give me joy and re­newal. But fre­quent check-in games make me less happy and less pro­duc­tive.

• “Pre­fer a few large, sys­tem­atic de­ci­sions to many small ones.”

1. Pick what per­centage of your port­fo­lio you want in var­i­ous as­sets, and re­bal­ance quar­terly, rather than mak­ing reg­u­lar buy­ing/​sel­l­ing decisions

2. Pri­ori­tize once a week, and by de­fault do what­ever’s next on the list when you com­plete a task.

3. Set up re­cur­ring hang­outs with friends at what­ever fre­quency you en­joy (e.g. weekly). Can­cel or resched­ule on an ad-hoc ba­sis, rather than schedul­ing ad-hoc

4. Ri­gor­ously de­cide how you will judge the re­sults of ex­per­i­ments, then run a lot of them cheaply. Ma­chine Learn­ing ex­am­ple: pick one eval­u­a­tion met­ric (might be a com­pos­ite of sev­eral sub-met­rics and rules), then au­to­mat­i­cally run lots of differ­ent mod­els and do a deeper dive into the 5 that perform par­tic­u­larly well

5. Make a pack­ing check­list for trips, and use it repeatedly

6. Figure out what crite­ria would make you leave your cur­rent job, and only take in­ter­views that plau­si­bly meet those criteria

7. Pick a rou­tine for your com­mute, e.g. listen­ing to pod­casts. Test new ideas at the rou­tine level (e.g. pod­casts vs books)

8. Find a spe­cific method for de­cid­ing what to eat—for me, this is query­ing sys­tem 1 to ask how I would feel af­ter eat­ing cer­tain foods, and pick­ing the one that re­turns the best answer

9. Ac­cept­ing ev­ery time a coworker asks for a game of ping-pong, as a way to get ex­er­cise, un­less I am about to en­ter a meeting

10. Always sug­gest­ing the same small set of places for coffee or lunch meetings