Quant, systems thinker, anarchist.
I write at https://entropicthoughts.com
My inbox is lw[at]xkqr.org
Quant, systems thinker, anarchist.
I write at https://entropicthoughts.com
My inbox is lw[at]xkqr.org
It is ridiculous. Even when I had a fractured wrist in a cast I was able to produce more torque on jars than my wife, and none of us are extreme iin any direction.
Given the number of surveys (20–30; I can’t be bothered to count carefully) and the sample size (200–500 you said below), does that put the total expenditure at $1000–3000?
I share the impression that whereas older models would try to do a good job, fail, and then get stuck in a loop trying the same thing over and over, newer models are more likely to give up early but still try to give a convincing impression of having done a good job.
I have assumed this to be due to an increasing focus on post-training techniques that improve benchmark scores. My mental model of LLM performance in evaluations is split into components (that probably interact to some degree):
Base training;
Post-training techniques such as fine-tuning and RLHF, etc; and
Inference-time techniques such as routing, best-of-N, chain-of-thought prompting, “wait” token insertion, etc.
From my understanding we haven’t actually been able to improve the first step very much, but we have learned a lot about the second two steps. If these don’t actually increase raw “intelligence” so much as they improve the appearance of intelligence, that would explain why newer models are increasingly reward hacking.
the sudden pivots and insight-flashes you’ll often see with recent models, the “wait”s and “a-ha”s and “actually, I want to try something completely different”s.
I was under the impression this was not produced by the model itself, but caused by external harnesses inserting “wait” tokens into the transcript before it goes back into the model to force it to reconsider.
However I do try and remember whenever my children request something to stop and think about it for a second instead of automatically saying no.
Small thing, but with children age 3 and 5 I have started to say “Yes, if you can sort out the logistics of it.”
Most of the things I deny my children aren’t because I don’t want them to have it or do it, but because I cannot find a way to fit it into our resource constraints, be it time, equipment, money, health, etc.
When my children respond to that with a genuine interest in trying to make it work, I inform them of the constraints and they ask feasibility questions. Sometimes they do come up with a plan that actually works! Most often they realise it would be too much work to be worth the payoff, and they think of something else to do instead.
(Given the topic of person vs. property at hand, I should also say that half the time my challenge is met with screaming demands that I must make it happen. Then, in my mind, they have used up their chance to act as a person and chosen to be “merely a child”, and I have to bluntly deny the wish without further discussion. (I might still try to explain it, depending on how much my patience has been drained already.))
The main thing that helps is simply distraction
The potential long-term cost of this is that it doesn’t teach conflict resolution. I have a strong learned response to seek distraction any time I am uncomfortable, but I don’t want to pass that on to my children.
I don’t think “every 5 minutes” is to be interpreted literally. After all, that would imply the siblings sleep in shifts so that one is always able to hit the other. (Or that they are in a constant boxing match throughout their waking hours to compensate for the lack of hitting during sleep.)
Most days, my children (3 and 5) have periods of the day (usually toward the evening) in which they have exhausted their patience for trying to talk it out and they hit each other at least every five minutes, unless we keep them separated. They also have periods in which they reason, empathise, and negotiate better than many adults I’ve met. The latter periods are rare, but getting more frequent with age.
My wife has been worried about the amount of hitting, so we have talked to child psychologists about it, and they claim it is well within a couple standard deviations. That doesn’t have to mean anything, of course, but the data on this is sparse, as one could imagine.
Thanks for the honest feedback! It is probably too early in my hobby research to share this, yes. My main hope is that it would resonate with someone else who might be more clear on what it is, and maybe even inspire some sort of measurement.
Depends significantly on where you live! I don’t worry about hurricanes, floods, earthquakes, etc.
Among the things that remain are fire, and my government says the fire services get called to 6000 domestic fires every year. Divided by a population of, say, 5 million households that’s a risk of 0.12 % per year. Maybe not all fires get fire services involvement, so we’ll bump it up to 0.2 %.
You won’t find actuarial tables, but they can often be constructed from official sources and/or press releases with some ingenuity. We’d do this for other risks too, like burglary, water damage, etc.
Of course, we could also gut feel our way there. Maybe we consider the past 20 years, and that we’d be told if any one in a circle of 5 friends would tell us about a serious event in their household, and we have been told twice in that time. That’s twice in 100 person-years, i.e. a 1⁄50 all-cause risk.
I agree—sorry about the sloppy wording.
What I tried to say wad that “if you act like someone who maximises compounding money you also act like someone with utility that is log-money.”
Your formula is only valid if utility = log($).
This is a synonym for “if money compounds and you want more of it at lower risk”. So in a sense, yes, but it seems confusing to phrase it in terms of utility as if the choice was arbitrary and not determined by other constraints.
The insurance company does not have logarithmic discounting on wealth, it will not be using Kelly to allocate bets. From the perspective of the company, it is purely dependent on the direct profitability of the bet—premium minus expected payout and overheads.
Not true. Risk management is a huge part of many types of insurance, and that is about finding the appropriate exposure to a risk—and this exposure is found through the Kelly criterion.
This matters less in some types of insurance (e.g. life, which has stable long-term rates and rare catastrophic events) but significantly in other types (liability, natural disaster-linked.)
This is only about maximising profit for a given level of risk, it has nothing to do with specific shapes of utility functions.
Fundamentally we are taking the probability-weighted expectation of log-wealth under all possible outcomes from a single set of actions, and comparing this to all other sets of actions.
The way to work in uncompensated claims is to add another term for that outcome, with the probability that the claim is unpaid and the log of wealth corresponding to both paying that cost out of pocket and fighting the insurance company about it.
It is under no such assumption! If you have sufficient wealth you will leave something even if you die early, by virtue of already having the wealth.
If it’s easier, think of it as the child guarding the parent’s money and deciding whether to place a hedging bet on their parent’s death or not—using said parent’s money. Using the same Kelly formula we’ll find there is some parental wealth at which it pays more to let it compound instead of using it to pay for premia.
Even so, at some level of wealth you’ll leave more behind by saving up the premium and having your children inherit the compound interest instead. That point is found through the Kelly criterion.
(The Kelly criterion is indeed equal to concave utility, but the insurance company is so wealthy that individual life insurance payouts sit on the nearly linear early part of the utility curve, whereas for most individuals it does not.)
I just wouldn’t use the word “Kelly”, I’d talk about “maximizing expected log money”.
Ah, sure. Dear child has many names. Another common name for it is “the E log X strategy” but that tends to not be as recogniseable to people.
you say “this is how to mathematically determine if you should buy insurance”.
Ah, I see your point. That is true. I’d argue this isolated E log X approach is still better than vibes, but I’ll think about ways to rephrase to not make such a strong claim.
what do you mean when you say this is what Kelly instructs?
Kelly allocations only require taking actions that maximise the expectation of the joint distribution of log-wealth. It doesn’t matter how many bets are used to construct that joint distribution, nor when during the period they were entered.
If you don’t know at the start of the period which bets you will enter during the period, you have to make a forecast, as with anything unknown about the future. But this is not a problem within the Kelly optimisation, which assumes the joint distribution of outcomes already exists.
This is also how correlated risk is worked into a Kelly-based decision.
Simultaneous (correlated or independent) bets are only a problem in so far as we fail to construct a joint distribution of outcomes for those simultaneous bets. Which, yeah, sure, dimensionality makes itself known, but there’s no fundamental problem there that isn’t solved the same way as in the unidimensional case.
Edit: In more laymanny terms, Kelly requires that, for each potential combination of simultaneous bets you are going to enter during the period, you estimate the probability distribution of wealth outcomes (and this probability distribution should account for any correlations) after the period has passed. Given that, Kelly tells you to choose the set of bets (and sizes in each) that maximise the expected log of wealth outcomes.
Kelly is a function of actions and their associated probability distributions of outcomes. The actions can be complex compound actions such as entering simultaneous bets—Kelly does not care, as long as it gets its outcome probability distribution for each action.
I’m confused by the calculator.
The probability should be given as 0.03 -- that might reduce your confusion!
Kelly is derived under a framework that assumes bets are offered one at a time.
If I understand your point correctly, I disagree. Kelly instructs us to choose the course of action that maximises log-wealth in period t+1 assuming a particular joint distribution of outcomes. This course of action can by all means be a complicated portfolio of simultaneous bets.
Of course, the insurance calculator does not offer you the interface to enter a periodful of simultaneous bets! That takes a dedicated tool. The calculator can only tell you the ROI of insurance; it does not compare this ROI to alternative, more complex portfolios which may well outperform the insurance alone.
If you get caught in a flood your whole neighborhood probably does too
This is where reinsurance and other non-traditional instruments of risk trading enter the picture. Your insurance company can offer flood insurance because they insure their portfolio with reinsurers, or hedge with catastrophy bonds, etc.
The net effect of the current practices of the industry is that fire insurance becomes slightly more expensive to pay for flood insurance.
I have a hobby horse that I think people misunderstand the justifications for Kelly, and my sense is that you do too
I don’t think I disagree strongly with much of what you say in that article, although I admit I haven’t read it that thoroughly. It seems like you’re making three points:
Kelly is not dependent on log utility—we agree.
Simultaneous, independent bets lower the risk and applying the Kelly criterion properly to that situation results in greater allocations than the common, naive application—we agree.
If one donates one’s winnings then one’s bets no longer compound and the expected profit is a better guide then expected log wealth—we agree.
In A World of Chance, Brenner, Brenner, and Brown look at this same question from a historic perspective, and (IIRC) conclude that gambling is about as damaging as alcohol, both for individuals and society. In other words, it should be legal (it gives the majority a relatively safe good time) but somewhat controlled (some cannot handle it and then it is very bad).
Do these more recent numbers corroborate that comparison to alcohol?
In my region of the world “butter knife” means a wooden utensil with round edges so it never even struck me that it could be sharp!