tgb

Karma: 1,873

tgb 27 May 2026 13:47 UTC
6 points
1
on: Standard deviations from just two values
Your estimate isn’t actually very good at estimating the population standard deviation (it substantially overestimates it), but it is good for estimating confidence intervals. That’s because confidence intervals also require you to account for the variance of the estimate of the mean, which is large when you only have two samples.

Simulating it out:

import numpy as np
x = np.random.normal(size=(100_000, 2))
diff = np.max(x, axis=1) - np.min(x, axis=1)
std_est = diff * 1.3
print(np.mean(std_est))
print(np.median(std_est))

This shows it over estimates (mean: 1.46, median 1.23, while the true standard deviation is 1). Using max—min gives a better estimator by both mean (1.13) and median (0.96). You are right that both the corrected sample standard deviation (max—min / sqrt(2)) and uncorrected sample standard deviation (max—min / 2) are quite biased downwards from the true standard deviation.

That being said, all of these are incredibly noisy estimators: the max-min estimator is only good for being within about 5x of the true value.

tgb 5 Feb 2026 17:20 UTC
2 points
0
in reply to: Viliam’s comment on: Is the Gell-Mann effect overrated?
I used the oldest version available in the Wayback machine so presumably it was how it was published, but it does include an “update” note as if it’s undergone at least one revision. It’s not impossible that the wayback machine is missing the earliest version. I still think that “copy and paste into a janky content management system interface” is probably the cause of whatever bad formatting it had rather than outright malice, but it may have been worse then than we see now (they state that formatting was changed though it’s not clear when).

tgb 4 Feb 2026 19:57 UTC
2 points
0
in reply to: gaw’s comment on: Is the Gell-Mann effect overrated?
This is definitely a leading hypothesis but I think it’s also the case that going to the experts directly will lead you more astray in psychology than in some other fields because the quality of the work there has been lower. It makes sense that journalism is low quality if the experts are also low quality, though of course we would hope that journalists would be able to improve upon what they’re given (by e.g. consulting multiple experts). I guess one of my points is: if you don’t believe the traditional press media, who do you believe? I’m not convinced there’s an answer that improves upon the media (Wikipedia?). In fact, a fair number of the articles you might be thinking of could be authored by psychologists: at least my local paper often includes articles written by local researchers, physicians, etc. on the topics in their field, under the Opinion heading.

Not sure what articles would count as pop philosophy, though.
For what it’s worth, circadian biology is quite open to popification. Eliezer Yudkowsky has written about finding the right timing to take melatonin for his sleep timing disorder. And practically all of us struggle with jet lag, daylight savings time changes (this part even being quite politicized!), or work schedules.

tgb 2 Feb 2026 20:20 UTC
2 points
0
in reply to: Viliam’s comment on: Is the Gell-Mann effect overrated?
The clear reason to pay for news is that you can buy higher quality news than what your social media shows you. But I did definitely carve out politically sensitive areas in my discussion for a reason.

> They even changed font to random sizes to have it appear unhinged
This caught my eye, but appears to be false: https://web.archive.org/web/20170805210606/https://gizmodo.com/exclusive-heres-the-full-10-page-anti-diversity-screed-1797564320 Has some weird formatting, presumably from copying it in from a Google doc, and presumably also why it lost the figures and URLs. The formatting doesn’t look unhinged at all, just a bit awkward, though their summarizing the changes as removing “several” hyperlinks is terrible (it looks more like a couple dozen links in the original to me). Though, I would not have ever thought of Gizmodo as a being high tier journalism in the first place.

tgb 30 Jan 2026 20:03 UTC
2 points
0
in reply to: Jiro’s comment on: Is the Gell-Mann effect overrated?
Good suggestion, though I don’t know how to systematically assess that. I can’t even think of what topics would be most likely to have this come up in.

tgb 30 Jan 2026 18:40 UTC
2 points
0
in reply to: simulus’s comment on: Is the Gell-Mann effect overrated?
Thanks for this example. I definitely see ridiculous headlines like that from less reputable places. Do you also have examples from the type of news media I’m talking about like WSJ? For example, searching “Washington Post AI robotics” I get headlines:
- “Humanoid robots were sci-fi. Suddenly they’re everywhere” about companies investing in and demoing humanoid robots, which seems to be true.
- “Not ready for robots in homes? The maker of a friendly new humanoid thinks it might change your mind” about the product “Sprout” by Fauna Robotics, which seems okay unless Fauna is completely faking it
- “Russia’s much-hyped humanoid robot face-plants onstage during debut” OK
- “Opinion | The Chinese robots are coming” hard to assess but not particularly hyperbolic
- “Robot smaller than grain of salt can ‘sense, think and act’” subtitled “With solar cells and its own propulsion system, the device is a step toward sending robots into the human body”. This seems closest to what you’re talking about. Here’s a press release for it: https://www.seas.upenn.edu/stories/penn-and-umich-create-worlds-smallest-programmable-autonomous-robots/ I’m sure this is an optimistic take on a research project but it seems fairly reasonable
(I realize now that “robotics” wasn’t really in your original statement, I guess I extrapolated that from your drone example.)

tgb 4 Dec 2025 2:13 UTC
4 points
0
in reply to: TsviBT’s comment on: Buck’s Shortform
Drug approvals have gone up in recent years: https://pmc.ncbi.nlm.nih.gov/articles/PMC10856271/ (figure 1). Of course most of those are not ones that you’ll encounter in day-to-day life. Meanwhile, some of the most commonly used over-the-counter drugs from previous decades have been pulled from the market or made harder to get (cold medicine particularly: phenylpropanolamine due to rare side effects in 2000, oral phenylephrine due to lack of effect last year, and pseudoephedrine restricted to behind the counter due to use in meth a decade ago or so).

tgb 20 Jul 2025 19:13 UTC
5 points
1
on: Shallow Water is Dangerous Too
I was going to say that you should still have the kid checked due to “secondary drowning”, but apparently that’s largely a myth: https://www.redcross.org/take-a-class/resources/articles/dry-or-delayed-secondary-drowning According to the Red Cross, there’s no record of anyone nearly drowning, completely returning to normal, and then dying afterwards. If the person had shown symptoms like confusion or coughing, they’d be at risk for later dying despite rescue, but not if they completely and quickly recovered after the incident.

tgb 26 Mar 2025 18:26 UTC
2 points
0
in reply to: TsviBT’s comment on: Tabula Bio: towards a future free of disease (& looking for collaborators)
I’m not as concerned about your points because there are a number of projects already doing something similar and (if you believe them) succeeding at it. Here’s a paper comparing some of them: https://www.biorxiv.org/content/10.1101/2025.02.11.637758v2.full

tgb 26 Mar 2025 14:53 UTC
2 points
0
in reply to: Yejun Y.’s comment on: Tabula Bio: towards a future free of disease (& looking for collaborators)
ML arguments can take more data as input. In particular, the genomic sequence is not a predictor used in LASSO regression models: the variants are just arbitrarily coded as 0,1, or 2 alternative allele count. The LASSO models have limited ability to pool information across variants or across data modes. ML models like this one can (in theory) predict effects of variants just based off their sequence on data like RNA-sequencing (which shows which genes are actively being transcribed). That information is effectively pooled across variants and ties genomic sequence to another data type (RNA-seq). If you include that information into a disease-effect prediction model, you might improve upon the LASSO regression model. There are a lot of papers claiming to do that now, for example the BRCA1 supervised experiment in the EVO-2 paper. Of course, a supervised disease-effect prediction layer could be LASSO itself and just include some additional features derived from the ML model.

tgb 22 Nov 2024 15:50 UTC
2 points
0
in reply to: Terence Coelho’s comment on: A very strange probability paradox
This is a lovely little problem, so thank you for sharing it. I thought at first it would be [a different problem](https://www.wolfram.com/mathematica/new-in-9/markov-chains-and-queues/coin-flip-sequences.html) that’s similarly paradoxical.

tgb 21 Aug 2024 13:53 UTC
2 points
0
in reply to: Richard_Kennaway’s comment on: Critique of ‘Many People Fear A.I. They Shouldn’t’ by David Brooks.
Again, why wouldn’t you want to read things addressed to other sorts of audiences if you thought altering public opinion on that topic was important? Maybe you don’t care about altering public opinion but a large number of people here say they do care.

tgb 20 Aug 2024 1:50 UTC
2 points
0
in reply to: Richard_Kennaway’s comment on: Critique of ‘Many People Fear A.I. They Shouldn’t’ by David Brooks.
He’s influential and it’s worth knowing what his opinion is because it will become the opinion of many of his readers. Hes also representative of what a lot of other people are (independently) thinking.

What’s Scott Alexander qualified to comment on? Should we not care about the opinion of Joe Biden because he has no particular knowledge about AI? Sure, I’m doubt we learn anything from rebutting his arguments, but once upon a time LW cared about changing the public opinion on this matter and so should absolutely care about reading that public opinion.

Honestly, I embarrassed for us that this needs to be said.

tgb 26 Jun 2024 12:27 UTC
4 points
2
on: Childhood and Education Roundup #6: College Edition
But you don’t need grades to separate yourself academically. You take harder classes to do that. And incentivizing GPA again will only punish people for taking actual classes instead of sticking to easier ones they can get an A in.
Concretely, everyone in my math department that was there to actually get an econ job took the basic undergrad sequences and everyone looking to actually do math started with the honors (“throw you in the deep end until you can actually write a proof”) course and rapidly started taking graduate-level courses. The difference on their transcript was obvious but not necessarily on their GPA.
What system would turn that into a highly legible number akin to GPA? I’m not sure, some sort of ELO system?

tgb 23 Feb 2024 14:32 UTC
1 point
0
on: Do sparse autoencoders find “true features”?
I was confused until I realized that the “sparsity” that this post is referring to is activation sparsity not the more common weight sparsity that you get from L1 penalization of weights.

tgb 17 Feb 2024 13:04 UTC
2 points
0
in reply to: GoteNoSente’s comment on: I played the AI box game as the Gatekeeper — and lost
Wait why do you think inmates escaping is extremely rare? Are you just referring to escapes where guards assisted the escape? I work in a hospital system and have received two security alerts in my memory where a prisoner receiving medical treatment ditched their escort and escaped. At least one of those was on the loose for several days. I can also think of multiple escapes from prisons themselves, for example: https://abcnews.go.com/amp/US/danelo-cavalcante-murderer-escaped-pennsylvania-prison-weeks-facing/story?id=104856784 notable since the prisoner was an accused murderer and likely to be dangerous and armed. But there was also another escape from that same jail earlier that year: https://www.dailylocal.com/2024/01/08/case-of-chester-county-inmate-whose-escape-showed-cavalcante-the-way-out-continued/amp/

tgb 12 Feb 2024 15:24 UTC
13 points
0
on: How do you actually obtain and report a likelihood function for scientific research?
i have some reservations about the practicality of reporting likelihood functions and have never done this before, but here are some (sloppy) examples in python. Primarily answering number 1 and 3.
```
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import matplotlib
import pylab

np.random.seed(100)

## Generate some data for a simple case vs control example
# 10 vs 10 replicates with a 1 SD effect size
controls = np.random.normal(size=10)
cases = np.random.normal(size=10) + 1
data = pd.DataFrame(
    {
        "group": ["control"] * 10 + ["case"] * 10,
        "value": np.concatenate((controls, cases)),
    }
)

## Perform a standard t-test as comparison
# Using OLS (ordinary least squares) to model the data
results = smf.ols("value ~ group", data=data).fit()
print(f"The p-value is {results.pvalues['group[T.control]']}")

## Report the (log)-likelihood function
# likelihood at the fit value (which is the maximum likelihood)
likelihood = results.llf
# or equivalently
likelihood = results.model.loglike(results.params)

## Results at a range of parameter values:
# we evaluate at 100 points between -2 and 2
control_case_differences = np.linspace(-2, 2, 100)
likelihoods = []
for cc_diff in control_case_differences:
    params = results.params.copy()
    params["group[T.control]"] = cc_diff
    likelihoods.append(results.model.loglike(params))

## Plot the likelihood function
fig, ax = pylab.subplots()
ax.plot(
    control_case_differences,
    likelihoods,
)
ax.set_xlabel("control - case")
ax.set_ylabel("log likelihood")


## Our model actually has two parameters, the intercept and the control-case difference
# We only varied the difference parameter without changing the intercept, which denotes the
# the mean value across both groups (since we are balanced in case/control n's)
# Now lets vary both parameters, trying all combinations from -2 to 2 in both values
mean_values = np.linspace(-2, 2, 100)
mv, ccd = np.meshgrid(mean_values, control_case_differences)
likelihoods = []
for m, c in zip(mv.flatten(), ccd.flatten()):
    likelihoods.append(
        results.model.loglike(
            pd.Series(
                {
                    "Intercept": m,
                    "group[T.control": c,
                }
            )
        )
    )
likelihoods = np.array(likelihoods).reshape(mv.shape)

# Plot it as a 2d grid
fig, ax = pylab.subplots()
h = ax.pcolormesh(
    mean_values,
    control_case_differences,
    likelihoods,
)
ax.set_ylabel("case - control")
ax.set_xlabel("mean")
fig.colorbar(h, label="log likelihood")
```
The two figures are:
I think this code will extend to any other likelihood-based model in statsmodels, not just OLS, but I haven’t tested.

It’s also worth familiarizing yourself with how the likelihoods are actually defined. For OLS we assume that residuals are normally distributed. For data points y_i at X_i the likelihood for a linear model with independent, normal residuals is:
$L = \prod_{i = 1}^{n} e x p (- (y_{i} - X_{i} β) / 2 σ^{2}) / \sqrt{2 π σ^{2}}$
where $β$ is the parameters of the model, $σ^{2}$ is the variance of the residuals, and $n$ is the number of datapoints. So the likelihood function here is this value as a function of $β$ (and maybe also $σ^{2}$ , see below).

So if we want to tell someone else our full likelihood function and not just evaluate it at a grid of points, it’s enough to tell them $y$ and $X$ . But that’s the entire dataset! To get a smaller set of summary statistics that capture the entire information, you look for ‘sufficient statistics’. Generally for OLS those are just $X^{T} y$ and $X^{T} X$ . I think that’s also enough to recreate the likelihood function up to a constant?
Note that $σ^{2}$ matters for reporting the likelihood but doesn’t matter for traditional frequentist approaches like MLE and OLS since it ends up cancelling out when you’re doing finding the maximum or reporting likelihood ratios. This is inconvenient for reporting likelihood functions and I think the code I provided is just using the estimated $σ^{2}$ from the MLE estimate. However, at the end of the day, someone using your likelihood function would really only be using it to extract likelihood ratios and therefore the $σ^{2}$ probably doesn’t matter here either?

tgb 17 Jan 2024 16:48 UTC
5 points
0
on: Medical Roundup #1
But yes, working out is mostly unpleasant and boring as hell as we conceive of it and we need to stop pretending otherwise. Once we agree that most exercise mostly bores most people who try it out of their minds, we can work on not doing that.
I’m of the nearly opposite opinion: we pretend that exercise ought to be unpleasant. We equate exercise with elite or professional athletes and the vision of needing to push yourself to the limit, etc. In reality, exercise does include that but for most people should look more like “going for a walk” than “doing hill sprints until my legs collapse”.

On boredom specifically, I think strenuousness affects that more than monotony. When I started exercising, I would watch a TV show on the treadmill and kept feeling bored, but the moment I toned down to a walking speed to cool off, suddenly the show was engaging and I’d find myself overstaying just to watch it. Why wasn’t it engaging while I was running? The show didn’t change. Monotony wasn’t the deciding factor, but rather the exertion.

Later, I switched to running outside and now I don’t get bored despite using no TV or podcast or music. And it requires no willpower! If you’re two miles from home, you can’t quit. Quitting just means running two miles back which isn’t really quitting so you might as well keep going. But on a treadmill, you can hop off at any moment, so there’s a constant drain on willpower. So again, I think the ‘boredom’ here isn’t actually about the task being monotonous and finding ways to make it less monotonous won’t fix the perceived boredom.

I do agree with the comment of playing tag for heart health. But that already exists and is socially acceptable in the form of pickup basketball/soccer/flag-football/ultimate. Lastly, many people do literally find weightlifting fun, and it can be quite social.

tgb 17 Jan 2024 16:24 UTC
6 points
0
in reply to: mike_hawke’s comment on: Medical Roundup #1
The American Heart Association (AHA) Get with the Guidelines–Heart Failure Risk Score predicts the risk of death in patients admitted to the hospital.⁹ It assigns three additional points to any patient identified as “nonblack,” thereby categorizing all black patients as being at lower risk. The AHA does not provide a rationale for this adjustment. Clinicians are advised to use this risk score to guide decisions about referral to cardiology and allocation of health care resources. Since “black” is equated with lower risk, following the guidelines could direct care away from black patients.
From the NEJM article. This is the exact opposite of Zvi’s conclusions (“Not factoring this in means [blacks] will get less care”).

I confirmed the NEJM’s account by using an online calculator for that score. https://www.mdcalc.com/calc/3829/gwtg-heart-failure-risk-score Setting a patient with black=No gives higher risk than black=yes. Similarly so for a risk score from the AHA,: https://static.heart.org/riskcalc/app/index.html#!/baseline-risk

Is Zvi/NYT referring to a different risk calculator? There are a lot of them out there. The NEJM also discuses a surgical risk score that has the opposite directionality, so maybe that one? Though there the conclusion is also about less care for blacks: “When used preoperatively to assess risk, these calculations could steer minority patients, deemed to be at higher risk, away from surgery.” Of course, less care could be a good thing here!

I agree that this looks complicated.

tgb 17 Jan 2024 15:45 UTC
2 points
2
on: Medical Roundup #1
Wegovy (a GLP-1 antagonist)
Wegovy/Ozempic/Semaglutide are GLP-1 receptor agonists, not GLP-1 antagonists. This means they activate the GLP-1 receptor, which GLP-1 also does. So it’s more accurate to say that they are GLP-1 analogs, which makes calling them “GLP-1s” reasonable even though that’s not really accurate either.