Jan Christian Refsgaard

Karma: 650

Data Scientist

Jan Christian Refsgaard Jun 6, 2025, 10:07 PM
3 points
0
on: The Best Reference Works for Every Subject
Domain: (Applied) Bayesian Statistics

Link: Statistical Rethinking (free pdf), My Less Wrong Review, The 2017-2023 Lectures*

Author: Richard McElreath

Type: Book, YouTube lectures and less wrong post about Bayesian Statistics books in general

Why: Modern Bayes relies on HMC sampling, this book goes all in on this approach, this allowed you to focus on how to build the model and allows you to skip all math (except for the link function), by sacrificing a little bit of mathematical rigor this book covers more than all other popular books on the subject, to the point where you can stop ²⁄₃ way trough and consider the last ¹⁄₃ “advanced optional topics”.
*The 2017 and 2019 were great, I have not watched the newer versions of the course, the older versions of the book uses ulam a pedagogical STAN wrapper powerful enough for most of the the exercises in the book, written by the author, I would advice serious students to do all the models in Stan, pymc or a wrapper that uses those.

Jan Christian Refsgaard Apr 2, 2025, 6:21 PM
2 points
0
in reply to: Beyond Singularity’s comment on: LessWrong has been acquired by EA
Yes, and EA only takes a 70% cut, with a 10% discount per user tier, its a bit ambiguously written so I cant tell if it goes from 70% to 60% or to 63%

Jan Christian Refsgaard Apr 1, 2025, 9:16 PM
1 point
0
in reply to: datawitch’s comment on: LessWrong has been acquired by EA
Why the down votes?, this guy showed epistemic humility and said when he got the Joke, I can understand not upvoting as it is not the most information dense engaging post, but why down vote?, down voting confuses me and I fear it may discourage other people from writing on LW.

Edit: this post had −12, so probably 1-2 super down voted or something, and then stopped.

Jan Christian Refsgaard Apr 1, 2025, 9:07 PM
2 points
1
in reply to: Beyond Singularity’s comment on: LessWrong has been acquired by EA
- Bronze User, 10€/month, gain Super upvote ability
- Silver User, 20€/month, posts cannot be down voted
- Gold User, 30€/month, post can be promoted to front page
- Platinum User, 50€/month, all posts are automatically promoted to the front page and curriated.
- Diamond User, 100€/month, user now only see adds on long posts
Loot Box: 10% Chance for +100 upvotes, 5% Chance for curriated status of random post

Each user tier gives 1 loot box per month.

Jan Christian Refsgaard Apr 1, 2025, 8:55 PM
1 point
0
on: LessWrong has been acquired by EA
Unable to comply, building in progress.

Jan Christian Refsgaard Apr 1, 2025, 5:40 PM
5 points
0
in reply to: GeneSmith’s comment on: Statistical Challenges with Making Super IQ babies
I am glad that you guys fixed bugs and got stronger estimates.

I suspect you fitted a model using best practices, I don’t think the methodology is my main critique, though I suspect there is insufficient shrinkage in your estimates (and most other published estimates for polygenic traits and diseases)

It’s the extrapolations from the models I am skeptical of. There is a big difference between being able to predict within sample where by definition 95% of the data is between 70-130, and then assuming the model also correctly predict when you edit outside this range, for example your 85 upper bound IQ with 500 edits, if we did this to a baseline human with IQ 100, then his child would get an IQ of 185, which is so high that only 60 of the 8 billion people on planet earth is that smart if IQ was actually drawn from a unit normal with mean 100 and sigma 15, and if we got to 195 IQ by starting with a IQ 110 human, then he would have a 90% chance of being the smartest person alive, which I think is unlikely, and I find it unlikely because there could be interaction effects or a miss specified likelihood which makes a huge difference for the 2% of the data that is not between 70-130, but almost no difference for the other 98%, so you can not test what correctly likelihood is by conventional likelihood ratio testing, because you care about a region of the data that is unobserved.

The second point is the distinction between causal for the association observed in the data, and causal when intervening on the genome, I suspect more than half of the gene is only causal for the association. I also imagine there are a lot of genes that are indirectly causal for IQ such as making you an attentive parent thus lowering the probability your kid does not sleep in the room with a lot of mold, which would not make the super baby smarter, but it would make the subsequent generation smarter.

Jan Christian Refsgaard Mar 7, 2025, 7:23 PM
6 points
0
in reply to: GeneSmith’s comment on: Statistical Challenges with Making Super IQ babies
Thanks, I am looking forward to that. There is one thing I would like to have changed about my post, because it was written a bit “in haste,” but since a lot of people have read it as it stands now, it also seems “unfair” to change the article, so I will make an amendment here, so you can take that into account in your rebuttal.
For General Audience: I stand by everything I say in the article, but at the time I did not appreciate the difference between shrinking within cutting frames (LD regions) and between them. I now understand that the spike and slab is only applied within each LD region, such that each region has a different level of shrinkage, I think there exists software that tries to shrink between them but FINEMAP does not do that as fare as I understand. I have not tried to understand the difference between all the different algorithms, but it seems like the ones that does shrink between cutting frames does it “very lightly”
Had I known that at the time of writing I would have changed Optional: Regression towards the null part 2. I think spike and slab is almost as good as using a fat-tailed distribution within each cutting frame (LD region), because I suspect the effect inflation primarily arises from correlations between mutations due to inheritance patterns and to a much smaller degree from fluctuations due to “measurement error/luck” with regards to the IQ outcome variable (except when two correlated variables have very close estimates). So if I were to rewrite that section, I would instead focus on the total lack of shrinking between cutting frames, rather than the slightly insufficient shrinkage within cutting frames.
For an intuitive reason for why I care:
- frequentest: the spike and slab estimator is unbiased for all of my effects across my 1000+ LD regions.
- Bayesian: bet you 5$ that the most positive effect is to big and the most negative effect is to small, the Bayesian might even be willing to bet that it is not even in the 95% posterior interval, because it’s the most extreme from 1000+ regions[1].
Not For General Audience, read at your own peril
Pointing at a Technical approach: It is even harder to write “how to shrink now” since we are now doing one more level of hierarchical models. The easiest way would be to have an adaptive spike and slab prior that you imagine all the 1000-2000 LD slap and spike priors are drawn from, and use that as an extra level of shrinkage. That would probably work somewhat. But I still feel that would be insufficient for the reasons outlined in part 2, namely that it will shrink the biggest effects slightly too much, and everything else too little, and thus underestimate the effects of the few edits and overestimate the effects of many edits, but such a prior will still shrink everything compared to what you have now, so even if it does insufficient/uneven shrinkage, it’s still a better estimate than no shrinkage between LD regions.
Implementation details of 3-level spike and slab models: It is however even harder to shrink those properly. A hint of a solution would be to ignore the fact that each of the spike and slab “top level adaptive priors” influence both the slab and spike of the 1000+ LD shrinkage priors, and thus only use the spike to regularize the spike and the slab to regularize the slab. It might be possible to estimate this “post hoc”, if your software outputs a sufficient amount of summary statistics, but I am actually unsure.
Implementation details of 3-level Gelman model: If you for some magical reason wanted to implement the method proposed by Andrew Gelman, as a two-level hierarchical model, then I can say from experience that when you have no effects, the method sometimes fails[2], so you should set number of mixtures to 1 for all LD regions that “suck” (suck=any mixture with one or more sigma < 1). I actually suspect/know the math for doing this may be “easy”, but I also suspect that most genetics software does fancy rule-of-thumb stuff based on the type of SNP, such as assuming that a stop codon is probably worse than a mutation in a non-coding region, and all that knowledge probably helps more with inferences than “not modeling tails correct” hurts.
- [1] I am not sure this bet is sound, because if the tails are fat, then we should shrink very little, so the 1:1000 vs 1:20 argument would utterly fail for a monogenic diseases, and the spike and slab stuff within cutting frames does some shrinkage.
- [2]If statisticians knew how to convolve a t-distribution it would not fail, because a t-distribution with nu=large number converges to a normal distribution, but because he approximates a t-like distribution as a mixture of normals, it sometimes fails when the effects are truly drawn from a normal, which will probably be the case for a few LD regions.

Jan Christian Refsgaard Mar 3, 2025, 7:20 AM
9 points
1
in reply to: kman’s comment on: Statistical Challenges with Making Super IQ babies
One of us is wrong or confused, and since you are the genetisist it is probably me, in which case I should not have guessed how it works from statistical intuition but read more, I did not because I wanted to write my post before people forgot yours.

I assumed the spike and slap were across all SNPs, it sounds like it is per LD region, which is why you have multiple spikes?, I also assumed the slab part would shrink the original effect size, which was what I was mainly interested in. You are welcome to pm me to get my discord name or phone number if a quick call could give me the information to not misrepresent what you are doing

My main critique is that I think there is insufficient shrinkage, so it’s the shrinkage properties I am mostly interested in getting right :)

Jan Christian Refsgaard Mar 3, 2025, 6:58 AM
25 points
5
in reply to: habryka’s comment on: Statistical Challenges with Making Super IQ babies
if I had to guess, then I would guess that ²⁄₃ of the effects are none causal, and the other ¹⁄₃ are more or less fully causal, but that all of the effects sizes between 0.5-1 are exaggerated by a factor of 20-50% and the effects estimated below +0.5 IQ are exaggerated by much more.
But I think all of humanity is very confused about what IQ even is, especially outside the ranges of 70-130, so It’s hard to say if it is the outcome variable (IQ) or the additive assumption breaks down first, I imagine we could get super human IQ, and that after 1 generation of editing, we could close a lot of the causal gap. I also imagine there are big large edits with large effects, such as making brain cells smaller, like in birds, but that would require a lot of edits to get to work.

Statistical Challenges with Making Super IQ babies

Jan Christian RefsgaardMar 2, 2025, 8:26 PM

154 points

26 comments9 min readLW link

Jan Christian Refsgaard Jan 14, 2024, 4:12 PM
1 point
0
in reply to: nd’s comment on: E.T. Jaynes Probability Theory: The logic of Science I
This might help you https://github.com/MaksimIM/JaynesProbabilityTheory

But to be honest I did very few of the exercises, from chapter 4 and onward most of the stuff Jayne says are “over complicated” in the sense that he derives some fancy function, but that is actually just the poison likelihood or whatever, so as long as you can follow the math sufficiently to get a feel for what the text says, then you can enjoy that all of statistics is derivable from his axioms, but you don’t have to be able to derive it yourself, and if you ever want to do actual Bayesian statistics, then HMC is how you get a ‘real’ posterior, and all the math you need is simply an intuition for the geometry of the MCMC sampler so you can prevent it from diverging, and that has nothing to do with Jaynes and everything to do with the the leapfrogging part of the Hamiltonian and how that screws up the proposal part of the metropolis algorithm.

Jan Christian Refsgaard Dec 29, 2023, 4:44 PM
0 points
−6
in reply to: Algon’s comment on: E.T. Jaynes Probability Theory: The logic of Science I
I am not aware of Savage much apart from both Bayesian and Frequentists not liking him. And I did not follow Jaynes math fully and there are some papers going back and forth on some of his assumptions, so the mathematical underpinnings may not be as strong as we would like.

I don’t know, Intuitively you should be able to ground the agent stuff in information theory, because the rules they put forwards are the same, Jaynes also has a chapter on decision theory where he makes the wonderful point that the utility function is way more arbitrary than a prior, so you might as well be Bayesian if you are into inventing ad hoc functions anyway.

Jan Christian Refsgaard Dec 28, 2023, 2:36 PM
2 points
0
in reply to: Iknownothing’s comment on: E.T. Jaynes Probability Theory: The logic of Science I
Ahh, I know that is a first year course for most math students, but only math students take that class :), I have never read an analysis book :), I took the applied path and read 3 other bayesian books before this one, so I taught the math in this books were simultaneously very tedious and basic :)

Jan Christian Refsgaard Dec 28, 2023, 11:26 AM
1 point
0
on: E.T. Jaynes Probability Theory: The logic of Science I
If anyone relies on tags to find posts, and you feel this post is missing a tag, then “Tag suggestions” will be much appreciated

Jan Christian Refsgaard Dec 28, 2023, 11:12 AM
2 points
0
in reply to: Iknownothing’s comment on: E.T. Jaynes Probability Theory: The logic of Science I
That surprising to me, I think you can read the book two ways, 1) you skim the math, enjoy the philosophy and take his word that the math says what he says it says 2) you try to understand the math, if you take 2) then you need to at least know the chain rule of integration and what a delta dirac function is, which seems like high level math concepts to me, full disclaimer I am a biochemist by training, so I have also read it without the prerequisite formal training. I think you are right that if you ignore chapter 2 and a few sections about partition functions and such then the math level for the other 80% is undergraduate level math

Jan Christian Refsgaard Dec 28, 2023, 10:55 AM
2 points
0
in reply to: bideup’s comment on: E.T. Jaynes Probability Theory: The logic of Science I
crap, you are right, this was one of the last things we changed before publishing because out previous example were to combative :(.
I will fix it later today.

E.T. Jaynes Probability Theory: The logic of Science I

Jan Christian Refsgaard and dentalperson

Dec 27, 2023, 11:47 PM

63 points

20 comments21 min readLW link

Jan Christian Refsgaard May 11, 2023, 7:26 PM
7 points
1
on: How much do you believe your results?
I think this is a pedagogical Version of Andrew Gelmans shrinkage Triology

The most important paper also has a blog post, The very short version is if you z score the published effects, then then you can derive a prior for the 20.000+ effects from the Cochrane database. A Cauchy distribution fits very well. The Cauchy distribution has very fat tails, so you should regress small effects heavily towards the null and regress very large effects very little.

Here is a fun figure of the effects, Medline is published stuff, so no effects between −2 and 2 as they would be ‘insignificant’, In the Cochrane collaboration they also hunted down unpublished results.

Here you see the Cochrane prior In red, you can imagine drawing a lot of random point from the red and then “adding 1 sigma of random noise”, which “smears out” the effect creating the blue inflated effects we observe.

Notice this only works if you have standardized effects, if you observe that breast feeding makes you 4 time richer with sigma=2, then you have z=2 which is a tiny effect as you need 1.96 to reach significance at the 5% level in frequentest statistics, and you should thus regress it heavily towards the null, where if you observe that breast feeding makes you 1% richer with sigma=0.01% then this is a huge effect and it should be regressed towards the null very little

Jan Christian Refsgaard Apr 20, 2023, 7:50 PM
1 point
−1
in reply to: randalljellis’s comment on: Book Review of 5 Applied Bayesian Statistics Books
SR if you can only read one, if you do not expect to do fancy things then ROS may be better as it is very good and explains the basics better. The logic of Science should be your 5th book and is good goal to set, The logic of Science is probably the rationalist bible, much like the real bible everybody swears by it but nobody has read or understood it :)

Jan Christian Refsgaard Feb 2, 2022, 7:16 PM
1 point
in reply to: KatWoods’s comment on: Listen to top LessWrong posts with The Nonlinear Library
Thanks for the reply, 3 seams very automatable, record all text before the image, if that’s 4 minuts then then put the image in after 4 min. But i totally get that stuff is more complicated than it initially seems, keep up the good work!

Jan Christian Refsgaard

Statis­ti­cal Challenges with Mak­ing Su­per IQ babies

E.T. Jaynes Prob­a­bil­ity The­ory: The logic of Science I

Statistical Challenges with Making Super IQ babies

E.T. Jaynes Probability Theory: The logic of Science I