kqr

Karma: 545

Quant, systems thinker, anarchist.

I write at https://entropicthoughts.com

My inbox is lw[at]xkqr.org

kqr 25 Jul 2026 6:07 UTC
1 point
0
in reply to: Ninety-Three’s comment on: War – What is it Good For?
That is implied by BdM. Later iterations of his argument rests on the idea that leaders cannot do what’s best for the country, but bound (to varying degree) to do what they’re expected to do by their population.

kqr 21 Jul 2026 11:40 UTC
1 point
0
in reply to: RedMan’s comment on: War – What is it Good For?
That’s a great question (mirrored by virtually all the other comments too)! I know Bueno de Mesquita has made a few high-profile predictions which did turn out to be directionally true, but I’m unaware of any quantitative verification. It shouldn’t be so hard to perform one, though, so either one exists or I could be the one to do it. I’ll see what I can find the time for. Thanks for the idea!

War – What is it Good For?

kqr20 Jul 2026 12:45 UTC

42 points

20 comments8 min readLW link

kqr 2 Jul 2026 9:45 UTC
2 points
0
in reply to: dynomight’s comment on: Do-it-yourself meta-analysis
Er, yes. That was obvious in my head and I should probably make it obvious in the article text too. Thanks!

Do-it-yourself meta-analysis

kqr1 Jul 2026 22:04 UTC

14 points

2 comments5 min readLW link

(entropicthoughts.com)

kqr 26 Jun 2026 18:13 UTC
5 points
0
in reply to: Rachel Shu’s comment on: Trees are mostly made of air and a generalizable lesson for AI safety
Even if you don’t lose weight, much of what you eat you exhale.
In some sense, our bodies are processing facilities for tree food, and before the tree consumes it, it is temporarily stored in the warehouse for tree food: the atmosphere!

A full body MRI earns you a year of smoking

kqr26 Jun 2026 14:31 UTC

11 points

8 comments3 min readLW link

(entropicthoughts.com)

kqr 22 Jun 2026 18:07 UTC
1 point
0
in reply to: derelict5432’s comment on: GLM 5.2 playing text adventures
I have deliberately avoided Zork because many copies of its walkthroughs are in the training data.
I’m using several newer games I happen to have played through. Each turn, the LLM is fed the output of the previous command, its thinking block from the previous turn, and a battery of questions around goals/puzzles/mysteries to keep it on track and make it retain important information in its next thinking block.
In the background, the harness listens for specific phrases printed by the game and award a point the first time they are encountered. A list of such achievements has to be written for each game.

GLM 5.2 playing text adventures

kqr18 Jun 2026 7:23 UTC

16 points

2 comments1 min readLW link

(entropicthoughts.com)

LLMs and almost good code

kqr9 Jun 2026 7:21 UTC

36 points

10 comments3 min readLW link

(entropicthoughts.com)

Lean, not backpressure

kqr1 Jun 2026 7:57 UTC

18 points

1 comment1 min readLW link

(entropicthoughts.com)

Standard deviations from just two values

kqr27 May 2026 5:01 UTC

41 points

2 comments3 min readLW link

(entropicthoughts.com)

kqr 21 May 2026 22:53 UTC
1 point
0
in reply to: G Wood’s comment on: Women should be able to open things
In my region of the world “butter knife” means a wooden utensil with round edges so it never even struck me that it could be sharp!

kqr 21 May 2026 22:50 UTC
9 points
0
in reply to: qwertypoiyoity’s comment on: Women should be able to open things
It is ridiculous. Even when I had a fractured wrist in a cast I was able to produce more torque on jars than my wife, and none of us are extreme iin any direction.

kqr 20 May 2026 7:38 UTC
16 points
0
in reply to: leogao’s comment on: My hobby: running deranged surveys
Given the number of surveys (20–30; I can’t be bothered to count carefully) and the sample size (200–500 you said below), does that put the total expenditure at $1000–3000?

Pythagorean addition

kqr20 May 2026 7:13 UTC

32 points

4 comments3 min readLW link

(entropicthoughts.com)

Why I made Engineering Enigmas

kqr3 May 2026 18:04 UTC

13 points

0 comments3 min readLW link

kqr 29 Apr 2026 7:47 UTC
3 points
0
on: llm assistant personas seem increasingly incoherent (some subjective observations)
I share the impression that whereas older models would try to do a good job, fail, and then get stuck in a loop trying the same thing over and over, newer models are more likely to give up early but still try to give a convincing impression of having done a good job.
I have assumed this to be due to an increasing focus on post-training techniques that improve benchmark scores. My mental model of LLM performance in evaluations is split into components (that probably interact to some degree):
1. Base training;
2. Post-training techniques such as fine-tuning and RLHF, etc; and
3. Inference-time techniques such as routing, best-of-N, chain-of-thought prompting, “wait” token insertion, etc.
From my understanding we haven’t actually been able to improve the first step very much, but we have learned a lot about the second two steps. If these don’t actually increase raw “intelligence” so much as they improve the appearance of intelligence, that would explain why newer models are increasingly reward hacking.
the sudden pivots and insight-flashes you’ll often see with recent models, the “wait”s and “a-ha”s and “actually, I want to try something completely different”s.
I was under the impression this was not produced by the model itself, but caused by external harnesses inserting “wait” tokens into the transcript before it goes back into the model to force it to reconsider.

Are LLMs not getting better?

kqr29 Apr 2026 6:27 UTC

24 points

4 comments2 min readLW link

kqr 28 Apr 2026 8:10 UTC
11 points
0
on: In defense of parents

However I do try and remember whenever my children request something to stop and think about it for a second instead of automatically saying no.

Small thing, but with children age 3 and 5 I have started to say “Yes, if you can sort out the logistics of it.”

Most of the things I deny my children aren’t because I don’t want them to have it or do it, but because I cannot find a way to fit it into our resource constraints, be it time, equipment, money, health, etc.

When my children respond to that with a genuine interest in trying to make it work, I inform them of the constraints and they ask feasibility questions. Sometimes they do come up with a plan that actually works! Most often they realise it would be too much work to be worth the payoff, and they think of something else to do instead.

(Given the topic of person vs. property at hand, I should also say that half the time my challenge is met with screaming demands that I must make it happen. Then, in my mind, they have used up their chance to act as a person and chosen to be “merely a child”, and I have to bluntly deny the wish without further discussion. (I might still try to explain it, depending on how much my patience has been drained already.))