MathiasKB

Karma: 656

An open mind is like a fortress with its gates unbarred and unguarded.

MathiasKB 27 Apr 2026 16:00 UTC
7 points
2
in reply to: Vladimir_Nesov’s comment on: Vladimir_Nesov’s Shortform
How interchangeable are the gains from RLVR and pre-training? My understanding was that additional pre-training yields improvements on different benchmarks than additional RLVR. If that’s true, you should be able to get an additional point of evidence of this hypothesis (in addition to the serving cost of models), by also looking at which benchmarks new releases show the biggest improvements on.

MathiasKB 20 Apr 2026 10:18 UTC
2 points
−2
in reply to: cubefox’s comment on: Eric Neyman’s Shortform
exclamation mark?

MathiasKB 17 Apr 2026 9:45 UTC
4 points
1
on: Carpathia Day
I found this story quite moving, thank you for taking the time to tell it

MathiasKB 12 Apr 2026 10:19 UTC
3 points
0
on: How to make good tea
bro if you’re gonna get this into tea you gotta try sheng. It’s like the jazz of tea, once someone gets sufficiently into tea they inevitably end up drinking nothing but raw puer

MathiasKB 28 Feb 2025 10:45 UTC
8 points
−1
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
One thing to highlight, which I only learned recently, is that the norm when submitting letters to the governor on any bill in California is to include: “Support” or “Oppose” in the subject line to clearly state the company’s position.

Anthropic importantly did NOT include “support” in the subject line of the second letter. I don’t know how to read this as anything else than that Anthropic did not support SB1047.

MathiasKB 12 Nov 2024 12:55 UTC
64 points
17
on: The Online Sports Gambling Experiment Has Failed
I’ll crosspost the comment I left on substack:

In Denmark the government has a service (ROFUS), which anyone can voluntarily sign up for to exclude themselves from all gambling providers operating in Denmark. You can exclude yourself for a limited duration or permanently. The decision cannot be revoked.
Before discussing whether gambling should be legal or illegal, I would encourage Americans to see how far they can get with similar initiatives first.

MathiasKB 11 Nov 2024 12:31 UTC
2 points
0
on: MathiasKB’s Shortform
Is there any good write up on the gut/brain connection and the effect fecal transplants?
Watching the South Park episode where everyone tries to steal Tom Brady’s poo got me wondering why this isn’t actually a thing. I can imagine lots of possible explanations, ranging from “because it doesn’t have much of an effect if you’re healthy” to “because FDA”.

MathiasKB 26 Sep 2024 19:52 UTC
1 point
0
on: Ironing Out the Squiggles
On this view, adversarial examples arise from gradient descent being “too smart”, not “too dumb”: the program is fine; if the test suite didn’t imply the behavior we wanted, that’s our problem.
Shouldn’t we expect to see RL models trained purely on self play not to have these issues then?
My understanding is that even models trained primarily with self play, such as katago, are vulnurable to adversarial attacks. If RL models are vulnurable to the same type of adversarial attacks, isn’t that evidence against this theory?

MathiasKB 13 Sep 2024 18:23 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: MathiasKB’s Shortform
The amount of inference compute isn’t baked-in at pretraining time, so there is no tradeoff.
This doesn’t make sense to me.
In a subscription based model, for example, companies would want to provide users the strongest completions for the least amount of compute.

If they estimate customers in total will use 1 quadrillion tokens before the release of their next model, they have to decide how much of the compute they are going to be dedicating to training versus inference. As one changes the parameters (subscription price, anticipated users, fixed costs for a training run, etc.) you’d expect to find the optimal ratio to change.
Test-time compute on one trace comes with a recommendation to cap reasoning tokens at 25K, so there might be 1-2 orders of magnitude more there with better context lengths. They are still not offering repeated sampling filtered by consensus or a reward model. If o1 proves sufficiently popular given its price, they might offer even more expensive options.
Thanks, this is a really good find!

MathiasKB 13 Sep 2024 9:39 UTC
1 point
0
in reply to: quetzal_rainbow’s comment on: MathiasKB’s Shortform
Thanks!! this is exactly what I was looking for

MathiasKB 13 Sep 2024 9:21 UTC
1 point
0
on: MathiasKB’s Shortform
With the release of openAI o1, I want to ask a question I’ve been wondering about for a few months.

Like the chinchilla paper, which estimated the optimal ratio of data to compute, are there any similar estimates for the optimal ratio of compute to spend on inference vs training?
In the release they show this chart:
The chart somewhat gets at what I want to know, but doesn’t answer it completely. How much additional inference compute would I need a 1e25 o1-like model to perform as well as a one shotted 1e26?
Additionally, for some x number of queries, what is the optimal ratio of compute to spend on training versus inference? How does that change for different values of x?
Are there any public attempts at estimating this stuff? If so, where can I read about it?

MathiasKB’s Shortform

MathiasKB13 Sep 2024 9:21 UTC

5 points

7 comments1 min readLW link

MathiasKB 18 Jul 2024 19:40 UTC
2 points
0
in reply to: FinalFormal2’s comment on: Poker is a bad game for teaching epistemics. Figgie is a better one.
If someone wants to set up a figgy group to play, I’d love to join

MathiasKB 10 Jun 2024 11:34 UTC
1 point
0
in reply to: Ruby’s comment on: Priors and Prejudice
I agree the conclusion isn’t great!
Not so surprisingly, many people read the last section as an endorsement of some version of “RCTism”, but it’s not actually a view I endorse myself.
What I really wanted to get at in this post was just how pervasive priors are, and how difficult it is to see past them.

MathiasKB 14 May 2024 20:59 UTC
3 points
0
on: D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
Just played through it tonight. This was my first D&D.Sci, found it quite difficult and learned a a few things while working on it.
Initially I tried to figure out the best counters and found a few patterns (flamethrowers were especially good against certain units). I then tried to look and adjust for any chronology, but after tinkering around for a while without getting anywhere I gave up on that. Eventually I just went with a pretty brainless ML approach.
I ended up sending squads for 5 and 6 which managed a 13.89% and 53.15% chance of surviving, I think it’s good I’m not in charge of any soldiers in real life!
Overall I had good fun, and I’m looking forward to looking at the next one.

Priors and Prejudice

MathiasKB22 Apr 2024 15:00 UTC

157 points

32 comments7 min readLW link 1 review

MathiasKB 12 Feb 2024 11:23 UTC
25 points
20
on: Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search
This wouldn’t be the first time Deepmind pulled these shenanigans.
My impression of Deepmind is they like playing up the impressiveness of their achievements to give an impression of having ‘solved’ some issue, never saying anything technically false, while suspiciously leaving out relevant information and failing to do obvious tests of their models which would reveal a less impressive achievement.
For Alphastar they claimed ‘grandmaster’ level, but didn’t show any easily available stats which would make it possible to verify. As someone who was in Grandmaster league at the time of it playing (might even have run into it on ladder, some of my teammates did), its play at best felt like low grandmaster to me.
At their event showing an earlier prototype off, they had one player (TLO) play their off-race with which he certainly was not at a grandmaster level. The pro player (Mana) playing their main race beat it at the event, when they had it play with the same limited camera access humans have. I don’t remember all the details anymore, but I remember being continuously annoyed by suspicious omission after suspicious omission.
What annoys me most is that this still was a wildly impressive achievement! Just state in the paper: “we managed to reach grandmaster with one out of three factions”—Nobody has ever managed to create AI that played remotely as well as this!
Similarly Deepminds no-search chess engine is surely the furthest anyone has gotten without search. Even if it didn’t quite make grandmaster, just say so!

MathiasKB 5 Feb 2023 15:35 UTC
1 point
0
in reply to: DirectedEvolution’s comment on: H5N1 - thread for information sharing, planning, and action
if it makes it easier, I can add the questions to manifold if you provide a list of questions and resolution criteria.

MathiasKB 5 Feb 2023 13:25 UTC
1 point
0
in reply to: jow’s comment on: H5N1 - thread for information sharing, planning, and action
thanks for pointing that out, I’ve added a note in the description

H5N1 - thread for information sharing, planning, and action

MathiasKB5 Feb 2023 12:44 UTC

31 points

8 comments1 min readLW link

MathiasKB

Mathi­asKB’s Shortform

Pri­ors and Prejudice

H5N1 - thread for in­for­ma­tion shar­ing, plan­ning, and action

MathiasKB’s Shortform

Priors and Prejudice

H5N1 - thread for information sharing, planning, and action