Holden is a smart guy, but he’s also operating under a severe set of political constraints, since his organization depends so strongly on its ability to raise funds. So we shouldn’t make too much of the fact that he thinks academia is pretty good—obviously he’s going to say that.
DanB
I would add two ideas:
Try to find a good role model—someone who is similar to you in relevant respects, is a couple of years ahead of you, who has done something you think is awesome, and who you can talk to and observe to some extent. Bill Gates is probably not a good role model.
Try to form a realistic assessment of how important college actually is; people often err in imagining it to be more or less important than it is in reality (these errors seem to be correlated with social class). I would estimate that the 4 years of college are only modestly more important than other years of your life. What you do right after college is important. What you do when you’re in your late 20s is important.
The Rediscovery of Interiority in Machine Learning
I started this essay last year, and procrastinated on completing it for a long time, until recently the GPT-3 announcement gave me the motivation to finish it up.
If you are familiar with my book, you will notice some of the same ideas, expressed with different emphasis. I congratulate myself a bit on predicting some of the key aspects of the GPT-3 breakthrough (data annotation doesn’t scale; instead learn highly complex interior models from raw data).
I would appreciate constructive feedback and signal-boosting.
Not a stupid question, this issue is actually addressed in the essay, in the section about interior modeling vs unsupervised learning. The latter is very vague and general, while the former is much more specific and also intrinsically difficult. The difficulty and preciseness of the objective make it much better as a goal for a research community.
Fight Akrasia and Decision Fatigue with DIY Productivity Software
Cool concepts! What tech stack did you use? Was it painful to get the Facebook API working?
- [deleted]
In my Phd thesis I explored an extension of the compression/modeling equivalence that’s motivated by Algorithmic Information Theory. AIT says that if you have a “perfect” model of a data set, then the bitstream created by encoding the data using the model will be completely random. Every statistical test for randomness applied to the bitstream will return the expected value. For example, the proportion of 1s should be 0.5, the proportion of 1s following the prefix 010 should be 0.5, etc etc. Conversely, if you find a “randomness deficiency”, you have found a shortcoming of your model. And it turns out you can use this info to create an improved model.
That gives us an alternative conceptual approach to modeling/optimization. Instead of maximizing a log-likelihood, take an initial model, encode the dataset, and then search the resulting bitstream for randomness deficiencies. This is very powerful because there is an infinite number of randomness tests that you can apply. Once you find a randomness deficiency, you can use it to create an improved model, and repeat the process until the bitstream appears completely random.
The key trick that made the idea practical is that you can use “pits” instead of bits. Bits are tricky, because as your model gets better, the number of bits goes down—that’s the whole point—so the relationship between bits and the original data samples gets murky. A “pit” is a [0,1) value calculated by applying the Probability Integral Transform to the data samples using the model. The same randomness requirements hold for the pitstream as for the bitstream, and there are always as many pits as data samples. So now you can define randomness tests based on intuitive contexts functions, like “how many pits are in the [0.2,0.4] interval when the previous word in the original text was a noun?”
I’m not sure exactly what you mean, but I’ll guess you mean “how do you deal with the problem that there are an infinite number of tests for randomness that you could apply?”
I don’t have a principled answer. My practical answer is just to use good intuition and/or taste to define a nice suite of tests, and then let the algorithm find the ones that show the biggest randomness deficiencies. There’s probably a better way to do this with differentiable programming—I finished my Phd in 2010, before the deep learning revolution.
One very important observation related to this issue is the fact that we often observe specific cognitive deficits (e.g. people who can’t use nouns) but those specific deficits are almost always related to a brain trauma (stroke, etc.) If there were significant cognitive logic coded into the genome, we should see specific cognitive deficits in otherwise healthy young people caused by mutations.
The Japanese Quiz: a Thought Experiment of Statistical Epistemology
Why isn’t this an argument for banning all politically powerful people from Twitter?
Compositionality: SQL and Subways
Thanks for the tip about Kusto—it actually does look quite nice.
A Small Vacation
Thanks for the positive feedback and interesting scenario. I’d never heard of Birobidzhan.
Having a budget where initial creation is essentially free (fun!) while maintenance is extremely expensive (drugery!) is a dramatic exaggeration for most software development.
My feeling is that most software development has exactly the same cost parameters; the difference is just that BigTech companies have so much money they are capable of paying thousands of engineers handsome salaries, to do the endless drudgery required to keep the tech stacks working.
The SQLite devs pledge to support the product until 2050.
Copied from a previous comment on Hacker News
I wish you well and I hope you win (ed, here I mean I hope the proposal is approved)
I am pessimistic though. I don’t think people really understand how much current homeowners do not want additional housing to be built. It makes sense if you consider that the net worth of a typical homeowner is very substantially made up of a highly leveraged long position in real estate. If that position goes south—because of an increase in housing supply, or because of undesirable new people moving into the neighborhood—the homeowner’s net worth could be decimated.
Now, most people will not come out and say directly that they are opposed to new housing for the obvious economic reason, because they don’t want to seem selfish and greedy and maybe racist. So they have to find a socially acceptable cover story to oppose new housing—environmentalism, concerns about safety, etc etc.
Interesting analysis. I hadn’t heard of Goodman before so I appreciate the reference.
In my view the problem of induction has been almost entirely solved by the ideas from the literature on statistical learning, such as VC theory, MDL, Solomonoff induction, and PAC learning. You might disagree, but you should probably talk about why those ideas prove insufficient in your view if you want to convince people (especially if your audience is up-to-date on ML).
One particularly glaring limitation with Goodman’s argument is that it depends on natural language predicates (“green”, “grue”, etc). Natural language is terribly ambiguous and imprecise, which makes it hard to evaluate philosophical statements about natural language predicates. You’d be better off casting the discussion in terms of computer programs, that take a given set of input observations and produce an output prediction.
Of course you could write “green” and “grue” as computer functions, but it would be immediately obvious how much more contrived the program using “grue” is than the program using “green”.