What do median and mode case peaks mean here? Is that on your ordering of day? Like, if for Jan 15-19 there’s a 10,20,30,14,16 is the mode the 17, the median the 19th, and the mean whatever you’d get by averaging?
How would you go about getting a high-risk person close to the front of the line for pavloxid treatment? I was really heartened to see the news about imminent approval because a very high-risk family member was recently exposed, but I think without hustle he would definitely not get it in time. I plan to call hospitals in his area tomorrow—anything else?
Fantasy by Eternity Forever: posthardcore supergroup with the grooviest instrumentals
I really loved reading this series. Came for the puns, stayed for the story. Thank you for writing!
In continuous control problems what you’re describing is called “bang-bang control”, or switching between different full-strength actions. In continuous-time systems this is often optimal behavior (because you get the same effect doing a double-strength action for half as long over a short timescale). Until you factor non-linear energy costs in, in which case a smoother controller becomes preferred.
Kiln People is a fantastic science fiction story which explores the same question, if the embodied copies are temporary (~24 hours). It explores questions of employment, privacy, life-purpose, and legality in a world where this cloning procedure is common. I highly recommend it to those interested.
“Now, suppose that in addition to g, you learn that Bob did well on paragraph comprehension. How does this change your estimate of Bob’s coding speed? Amazingly, it doesn’t. The single number g contains all the shared information between the tests.”
I don’t think this is right, if some fraction of the test for g is paragraph comprehension. If g is the weighted average between paragraph comprehension and addition skill, knowing g and paragraph comprehension gives you addition skill.
Yep, they’re different. It’s just an architecture. Among other things, Chess and Go have different input/action spaces, so the same architecture can’t be used on both without some way to handle this.
This paper uses an egocentric input, which allows many different types of tasks to use the same architecture. That would be the equivalent of learning Chess/Go based on pictures of the board.
Can you extrapolate the infectiousness ratio between the newest most virulent strain and the original? I assume the original has all but died out, but maybe by chaining together estimates of intermediate strains?
I’m reading BDA3 right now, and I’m on chapter 6. You described it well. It takes a lot of thinking to get through, but is very comprehensive. I like how it’s explicitly not just a theory textbook. They demonstrate each major point by describing a real-world problem (measuring cancer rates across populations, comparing test-prep effectiveness), and attacking it with multiple models (usually frequentist to show limitations and then their Bayesian model more thoroughly. It has a focus on learning the tools well enough to apply them to real-world problems.
I plan to start skimming soon. It seems the first two sections are pedagogical, and the remainder covers techniques which I would like to know about but don’t need in detail.
Edit: One example I really enjoyed, and which felt very relevant to today, was on estimating lung-cancer hotspots in America. It broke the country down by county, and first displayed a map of the USA with counties in the top 10% of lung-cancer rates. Much of the highlighted region was in the rural southwest and Rocky mountain region. It asked, what do you think makes these regions have such high rates? It then showed another map, this one of counties in the bottom 10% of lung-cancer rates, and the map focused on the same regions!
Turns out, this was mostly the result of these regions containing many low-population counties, which meant rare-event sampling could skew high very easily, just by chance. If the base rate is 5 per 10,000, and you have 2 cases in a county with 1,000 people, you look like a superfund site. But sample the next year and you might find 0 cases: a county full of young health-freaks.
If you model lung-cancer rates as a hierarchical model with a distribution for county cancer-rates, and each county as being sampled from this, and then sampling cancer events from it’s specific rate, then you can get a Bayes-adjusted incidence rate for each county which will regress small counties to the mean.
This made me read Covid charts which showed hot-spot counties much differently. I noticed that the counties they list are frequently small. Right now, all the counties on the NYTimes list, for example have less than 20,000 people in them, which is, I believe, in the bottom 25% of counties by size roughly.
I think it comes from a feeling that proportion of blame needs to add to one, and by apologizing first you’re putting more of the blame on your actions. You often can’t say “I apologize for the 25% of this mess I’m responsible for.”
I think the general mindset of apportioning blame (as well as looking for a single blame-target) is a dangerous one. There’s a whole world of things that contribute to every conflict outside of the two people having it.
I also think that the example isn’t perfect (although I haven’t formalized why yet). But, you’re describing tactical voting, which is considered one of the “downsides” of RCV.
I think things become simpler when you look at the sum of all stocks, versus particular ones. Then, you only need to consider the market cap of the entire stock market, and what makes it change over time.
The economy is much bigger than the stock market. Money flows from small companies to larger one as the economy consolidates—since the former are more likely to be publicly traded than the latter, that makes the market become bigger.
It’s easier to invest in the stock market now than in the past. Since it’s accessible to more people, then more people’s money can be put in it. So, the market cap goes up.
Finally, as inequality increases, more of a fraction of wealth is disposable, and therefore can be invested. That makes the market grow as well.
I’m sure there are many other reasons along these lines.
When reading HPMOR, I thought Harry’s decision to not kill Voldemort was hubris, even considering his morals. He should have known that all plans have a non-zero chance of failing, and that Voldemort coming back could be an existential risk. So, finite harm now to stop a small chance at nearly infinite harm later. Nice to see this coming to bite him. I’m looking forward to the conclusion!
In what world is giving the second dose to the same person, raising them from 87% to 96% protected, a higher priority than vaccinating a second person?
One benefit of 96 vs 87 is that the former could allow you to live an un-quarantined life, while the latter wouldn’t result in much behavioral change. Clearly, the one-dose is better for net deaths etc, but the QALY calculation looks a little different.
I still agree with you. But it’s worth considering the above reasoning for completeness.
So interesting that I thought you were going to go the opposite direction at the end. I have felt slight amounts of imposter syndrome before, and it came from feeling like a well-liked and well-respected person whose skills did not fully back up my reputation. So, I was high on the social hierarchy but I perceived it coming from dominance and not prestige.
A parallel problem with prediction markets: at non-financial-industry scales: they’re used as signals of confidence. How often do you see, after someone makes a bold claim, a response saying to “put your money where your mouth is.” But just the act of signalling confidence can be intrinsically valuable to the person making a claim. In bet-capped places like predictit, this can make equilibrium that are different from optimal, because there are non-monetary incentives at work.
This is a good and valid question—I agree, it isn’t fair to say generalization comes entirely from human beliefs.
An illustrative example: suppose we’re talking about deep learning, so our predicting model is a neural network. We haven’t specified the architecture of the model yet. We choose two architectures, and train both of them from our subsampled human-labeled D* items. Almost surely, these two models won’t give exactly the same outputs on every input, even in expectation. So where did this variability come from? Some sort of bias from the model architecture!