List of things I think are worth reading (last updated 3/1/2021)
For personal reference, I’m going to keep track of things I read which I think are worth reading. By “worth reading” I specifically mean it scores high on this metric:
Each entry will have a title, link, date it was added, summary with salient examples / quotes, and outcome (what I personally got out of reading it). The entry is probably only worth your time if you think you would be better off achieving that outcome.
This list was last updated on 3/1/2021.
The New Business of AI (and How It’s Different From Traditional Software) (added 1/1/2021) Summary: AI businesses can be modeled as a hybrid of software and service businesses. State-of-the-art AI is expensive (>= $100k) to train: you have to pay for the compute and (cloud) storage. Compute needed for current models is increasing faster than hardware (e.g. NVIDIA GPUs) is advancing. Edge cases make AI accuracy have an S-curve shape (“as much as 40-50% of intended functionality for AI products we’ve looked at can reside in the long tail of user intent”). Deploying to each customer can have high marginal cost if you are fine-tuning the model. (“Even customers that appear similar – two auto manufacturers doing defect detection, for example – may require substantially different training data, due to something as simple as the placement of video cameras on their assembly lines.”) AI model architectures are usually public and data usually belongs to the customer, which makes profitability trickier. Data has “diseconomics of scale” because each subsequent edge case is more difficult to address. To make AI more practical: reduce model complexity (so less tuning is necessary), focus on low-complexity (but high-scale) tasks, enter the services market rather than pure tech (“running a taxi service rather than selling self-driving cars”). Outcome: I know a bit more about concerns when doing large-scale, practice-oriented AI.
The Bitter Lesson (added 3/1/2021) Summary: Progress in AI is driven by general-purpose techniques which scale with increased computation—the two main examples being search and learning. Outcome: Abstract understanding of what the frontier of modern ML research includes and doesn’t include.
What Failure Looks Like: Distilling the Discussion (summary of original post) (added 3/1/2021) Summary: Two ways AI could cause x-risk. One, failure by loss of control: AI will optimize for metrics (e.g. GDP), and by Goodhart’s law “underlying reality will increasingly diverge from what we think these metrics are measuring” (e.g. economy will be in shambles even if GDP is high). Two, failure by enemy action: Influence-seeking ML systems will be built “by default” even if it’s not a goal we specifically request. Influence-seeking AI may mostly do the job we specified for it, but sometimes blatantly fail (“an automated corporation may just take the money and run”). As automation grows, we may not be able to recover from a correlated automation failure. Outcome: More concrete and serious-sounding understanding of where x-risk from AI could come from. (I find the summary post more digestible than the original.)
Doing Good Better (added 1/1/2021) Summary: An introduction to effective altruism. Outcome: Made EA ‘click’ for me. I had encountered EA multiple times in the past but this book made me want to become an EA.
Tails of Great Soccer Players (added 1/1/2021) Summary: Why are there more tall people from Norway than India, even though India’s population is 630 million and Norway’s is 2.5 million? Answer: tails of normal distributions fall very quickly. To get really tall people, it’s better to slightly shift the mean to be larger, rather than to scale the curve upward (population increase). Quote: “the relative height of the curve actually drops faster the further out you go. [...] The height of the curve at 1 SD is 4.5 times higher than that at 2 SD. The curve at 5 SD is 250 times higher than that at 6 SD and it keeps getting steeper and steeper.” Outcome: I understand how the tail of the normal distribution behaves a bit better.
What is the difference between likelihood and probability? (answer by Lenar Hoyt) (added 1/1/2021) Summary: is the probability we observe given the model . If the model is fixed (e.g. fair coin), we can compute the probabilities. If the outcome is fixed (e.g. heads times in a row), we can compute the likelihood that if holds, then we get output . Maximum likelihood is when you look at all possible and infer whichever maximizes the probability of observing . Outcome: Aside from learning the technical term “likelihood”, understanding a bit better the dichotomy of “model ⇒ sample” (probability) versus “sample ⇒ model” (likelihood).
Is the dual notion of a presheaf useful? (comment by Kevin Ventullo) (added 1/1/2021) Summary: “[M]orphisms out of an object give you global information whereas morphisms into an object tend to give more local information. One (trivial) way of seeing this is that morphisms out of the category necessarily involve all objects, but morphisms in might only involve some small subcollection.” Outcome: high-level understanding of how morphisms into / out of an object give local / global information.
Large networks and graph limits, Part 1 (first 40 pages of book by László Lovász) (added 1/1/2021) Summary: too many good things to list. Outcome: high-level intuition for interesting questions and tools in graph theory.
Bayes’ rule in terms of odds (video by 3blue1brown) (added 1/1/2021) Summary: Bayes’ rule is easier to work with if we use odds instead of probabilities (e.g. writing odds instead of probability). Let be the odds of observing given . Then Bayes’ rule says . We are updating our prior odds to reflect the new evidence using the Bayes factor , which tells us how much observing should influence our belief that we have versus . Outcome: better understanding of how “updating priors” relates to Bayes’ theorem.
Rotations in 4D? (answer by John Hughes) (added 1/1/2021) Summary: You can’t have a rotation in 4D where 3 dimensions are rotated about a 1 dimensional axis because complex eigenvalues come in conjugate pairs. So you can have 2 dimensions rotating and the other 2 fixed, or two separate 2 dimensional rotations happening at once. Outcome: neat trivia about 4D rotations + understand linear algebra better by seeing it be used fluently.
Rationality / life advice
The concept of career capital (80,000 Hours key ideas) (added 1/1/2021) Summary: Career capital is “the skills, connections, credentials, and financial resources that can help you have a bigger impact in the future”. There is a tradeoff between gaining career capital = generalization = transferability and short-term impact = specialization. Building career capital is closely related to making career choices which give better backup options. Example: pros/cons of doing a PhD. Outcome: understanding the concept of “career capital” has helped me evaluate career choices.
Reference class forecasting (added 1/1/2021) Summary: If you want to estimate <thing>, look at estimations of past <things> rather than just analyzing the <thing> you have. Outcome: gave me a strategy to use when forecasting.
What does flow look like for mental tasks (added 2/5/2021) Summary: Flow is when self-awareness falls away. Outcome: I think this applies to physical tasks (= “doing something automatically”) and dumb tasks (= “browsing social media mindlessly”) just as much as mental tasks. So just, very useful.
Wikipedia’s summary of “How to Read a Book” (added 3/1/2021) Summary: When reading something, try to understand it in three stages: 1) structural stage (its purpose, the main questions it asks/answers), 2) interpretive stage (the logic behind its main points), 3) critical stage (form opinions about its points). Outcome: keeping these steps in mind helps me remember more / more deeply understand what I read. Often I neglect (1) and (3), even though these are arguably the most important steps.