Jan Christian Refsgaard(Jan Christian Refsgaard)

Karma: 436

Data Scientist

Jan Christian Refsgaard 2 May 2021 9:58 UTC
2 points
on: Prediction and Calibration—Part 1
I have not been consistent with my probability notation, I sometimes use upper case P and sometimes lower case p, in future posts I will try to use the same notation as Andrew Gelman, which is $P r$ for things that are probabilities (numbers) such as $P r (y = 1) = 0.7$ and $p$ for distributions such as $p \sim N (0, 2)$ . However since this is my first post, I am afraid that ‘editing it’ will waist the moderators time as they will have to read it again to check for trolling, what is the proper course of action?

Jan Christian Refsgaard 2 May 2021 19:52 UTC
1 point
in reply to: scroogemcduck1’s comment on: Prediction and Calibration—Part 1
Thanks, also thanks for pointing out that I had written $p (θ ∣ y)$ a few places instead of $p (y ∣ θ)$ , since everything is the bernoulli distribution I have changed everything to $p$

Jan Christian Refsgaard 3 May 2021 4:51 UTC
2 points
in reply to: renato’s comment on: Prediction and Calibration—Part 1
You are absolutely right, any framework that punishes you for being right would be bad, my point is that increasing your calibration helps a surprising amount and is much more achievable than “just git good” which is required for improving prediction.

I will try to put your point into the draft when I am off work , thanks

Jan Christian Refsgaard 3 May 2021 4:57 UTC
1 point
in reply to: ejacob’s comment on: Prediction and Calibration—Part 1
you mean the N’th root of 2 right?, which is what I called the null predictor and divided Scott predictions by in the code:
```
random_predictor = 0.5 ** len(y)
```
which is equivalent to ${0.5}^{N}$ where $N$ is the total number of predictions

Jan Christian Refsgaard 3 May 2021 15:17 UTC
2 points
in reply to: renato’s comment on: Prediction and Calibration—Part 1
I have tried to add a paragraph about this, because I think it’s a good point, and it’s unlikely that you were the only one who got confused about this, Next weekend I will finish part 2 where I make a model that can track calibration independent of prediction, and in that model the 60% ⁶¹⁄₁₀₀ will have a better posterior of the calibration parameter than then 60% ¹⁰⁰⁄₁₀₀, though the likelihood of the ¹⁰⁰⁄₁₀₀ will of course still be highest.

Jan Christian Refsgaard 8 May 2021 19:18 UTC
1 point
in reply to: renato’s comment on: Prediction and Calibration—Part 1
you may be disappointed, unless you make 40+ predictions per week it will be hard to compare weekly drift, the Bernoulli distribution has a much higher variance compared to the normal distribution, so the uncertainty estimate of the calibration is correspondingly wide (high uncertainty of data → high uncertainty of regression parameters). My post 3 will be a hierarchical model which may suite your needs better but it will maybe be a month before I get around to making that model.

If there are many people like you then we may try to make a hackish model that down weights older predictions as they are less predictive of your current calibration than newer predictions, but I will have to think long and hard to make than into a full Bayesian model, so I am making no promises

Jan Christian Refsgaard 9 May 2021 6:19 UTC
2 points
in reply to: MrGus99’s comment on: a visual explanation of Bayesian updating
The order does not matter, you can see that by focusing on $θ = \frac{1}{2}$ which is always equal to $\frac{1}{N^{2}}$ , you can also see it from the conjugation rule where you end with $B e t a (3, 2)$ no matter the order.

If you wanted the order to matter you could down weight earlier shots or widen the uncertainty between the updates, so previous posterior becomes a slightly wider prior to capture the extra uncertainty from the passage of time.

Jan Christian Refsgaard 9 May 2021 6:22 UTC
2 points
in reply to: Evenflair’s comment on: a visual explanation of Bayesian updating
Mine was the same, I became a bayesian statetisian 4 years ago. I gave a talk about Bayesian Statistics and this figure was what made it click to most students (including myself), so i wanted to share it

Jan Christian Refsgaard 10 May 2021 8:56 UTC
1 point
in reply to: Measure’s comment on: a visual explanation of Bayesian updating
fixed

Jan Christian Refsgaard 10 May 2021 9:10 UTC
3 points
on: a visual explanation of Bayesian updating
I am well aware that nobody asked for this, but here is the proof that the posterior is $B e t a (α + z, β - z + 1)$ for the beta-bernoulli model.

We start with Bayes Theorem:

$p (θ ∣ z) = \frac{p (z ∣ θ) p (θ)}{p (z)}$

Then we plug in the definition for the Bernoulli likelihood and Beta prior:

$p (θ ∣ z) = θ^{z} (1 - θ)^{1 - z} \times \frac{θ^{α - 1} (1 - θ)^{β - 1}}{B (α, β)} \times \frac{1}{p (z)}$

Let’s collect the powers in the numerator, and things that does not depend on $θ$ in the denominator

$p (θ ∣ z) = \frac{θ^{α + z - 1} (1 - θ)^{β - z}}{B (α, β) p (z)}$

Here comes the conjugation shenanigans. If you squint, the top of the distribution looks like the top of a Beta distribution:

$\begin{matrix} α^{'} & = α + z β^{'} & = β - z + 1 p (θ ∣ z) & = \frac{θ^{α^{'} - 1} (1 - θ)^{β^{'} - 1}}{B (α, β) p (z)} \end{matrix}$

Let’s continue the shenanigans, since the numerator looks like the numerator of a beta distribution, we know that it would be a proper beta distribution if we changed the denominator like this:

$\begin{matrix} p (θ ∣ z) & = \frac{θ^{α^{'} - 1} (1 - θ)^{β^{'} - 1}}{B (α^{'}, β^{'})} p (θ ∣ z) & = \frac{θ^{α + z - 1} (1 - θ)^{β - z}}{B (α + z, β - z + 1)} \end{matrix}$

Jan Christian Refsgaard 21 May 2021 19:10 UTC
1 point
in reply to: PatrickDFarley’s comment on: The Reebok effect
What’s wrong with it?, I am linking to the source material, should i only link if its a 100% copy?

Jan Christian Refsgaard 22 May 2021 9:11 UTC
1 point
in reply to: ChristianKl’s comment on: The Reebok effect
Most statisticians would agree with you. Unless Reeboks expected market share were between ²⁄₃ and ³⁄₅ of course. Though I expect that most laymen would have Phil’s Intuition, in any case the general point: That the statement leaks information remains :)

Jan Christian Refsgaard 22 May 2021 9:15 UTC
1 point
in reply to: Clark Benham’s comment on: Book Review of 5 Applied Bayesian Statistics Books
I will write a post shilling for myself, thanks. I was waiting for the post to be ‘liked’, if it got −10 karma then there would be no use in shilling for it :)

Jan Christian Refsgaard 22 May 2021 9:23 UTC
4 points
in reply to: Yoav Ravid’s comment on: The Best Textbooks on Every Subject
I have written a review of the 5 most popular Applied Bayesian Statistics books

Where I recommended:
- Statistical Rethinking
  - Up to speed fast, no integrals, very intuitive approach.
- Doing Bayesian Data Analysis
  - This is the easiest book. If your goal is only to create simple models and you aren’t interested in understanding the details, then this is the book for you.
- A Student’s Guide to Bayesian Statistics
  - This book has the opposite focus of the Dog book. Here the author slowly goes through the philosophy of Bayes with an intuitive mathematical approach.
- Regression and Other Stories
  - Good if you want a slower and thorough approach where you also learn the Frequentest perspective.
- Bayesian Data Analysis
  - The most advanced text, very math heavy, best as a second book after reading one or two of the others, unless you are already a statistician.

Jan Christian Refsgaard 22 May 2021 15:57 UTC
1 point
in reply to: MikkW’s comment on: The Reebok effect
If Reebok wanted to report a valid statistic they would report something like 11% of the top 100 wears our shoos, I think a much smaller number than top 100 was picked exactly because that was where the effect was the most exaggerated. ChristianKl share your intuition that ³⁄₅ sounds more impressive than ²⁄₃. I also agree that reporting top 4 would seem even more fishy, though it could be spun as 75% of quarter finalists in knockout tournament sports.

Jan Christian Refsgaard 22 May 2021 16:02 UTC
1 point
in reply to: ChristianKl’s comment on: The Reebok effect
I disagree, if Reebok produced 64% of all shoes in the world and only ³⁄₅ of top athletes used them, and this furthermore was the best statistics the marketing department could produce, then it’s strong evidence that they are over hyped.

But I think you understood me as saying something different, words are hard :)

Jan Christian Refsgaard 22 May 2021 16:18 UTC
2 points
in reply to: GuySrinivasan’s comment on: Book Review of 5 Applied Bayesian Statistics Books
Good point!

original: Applied Bayesian Statistics—Which book to read?
1. Applied Bayesian Statistics—Which book should you read?
2. Literature Review of 5 Applied Bayesian Statistics Books.
3. Book Review of 5 Applied Bayesian Statistics Books.
I picked 3, if other people have strong feeling feel free to suggest other titles

Jan Christian Refsgaard 22 May 2021 16:42 UTC
1 point
in reply to: ChristianKl’s comment on: The Reebok effect
I think we agree and are talking past each other, my original statement was “Most statisticians would agree with you. Unless...”

So we agree that there is more power in ³⁄₅ than ²⁄₃, and we happen to have divergent intuitions about what random Joe finds most persuasive, my intuition is rather weak so I would gladly update it towards ³⁄₅ sounding more impressive to random people, if you feel strongly about it.

Most likely what the marketing folks have done is gotten a list of top 100 runners in different running disciplines and the reported “the most impressive top X in list Y”,

We both agree that the reported statistic is inflated, which is the major thesis, we simply disagree about how much information can be recovered because we have different “impressiveness sounding heuristics”

Jan Christian Refsgaard 25 May 2021 7:36 UTC
1 point
in reply to: wangscarpet’s comment on: Book Review of 5 Applied Bayesian Statistics Books
I loved that example as well, I have heard it elsewhere described as “The law of small numbers”, where small subsets have higher variance and therefore more frequent extreme outcomes. I think it’s particularly good as the most important part of the Bayesian paragdime is the focus on uncertainty.

The appendix on HMC is also a very good supplement to gain a deeper understanding of the algorithm after having read the description in another book first.

Jan Christian Refsgaard 2 Jun 2021 10:45 UTC
2 points
on: Jan Christian Refsgaard’s Shortform
Cholera is the devil!
The National Center for Biotechnology Information has a Taxonomy database.
Q: What do you think taxid=666 is?
A: Vibrio cholerae, coincidence? I think not!
proof:
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=info&id=666