Intuition pump for falling bomb manuverability: they fly like a thrust-to-weight ratio 1 fighter jet as long as their descent angle is within ~45? degrees of vertical: their whole mass is thrust and their power expenditure is proportional to their speed (rocket/jet like, not rotorlike). Results: they are capable of sustaining high G turns and move at transonic or supersonic speeds. If you are low to the ground and want to dodge an actively guided falling bomb this will involve accelerating at 3-10 Gs and sustaining that acceleration until you reach several hundred miles an hour. A thrust to weight ratio ~ .1 cesna will lose, a thrust to weight ratio 5 quadcopter can’t run away because its power output is constant instead of scaling with speed but might be able to do an intercept, a thrust to weight ratio 5 missile will win soundly.
Hastings
actually, on further thought, it’s worse than that . consider the caustic.
Explicitly, where your math breaks down, and things shaped like your theodicies are often useful, is that pretty often a naive compute limited estimate of P(G & E) will be zero with high probability unless you importance sample using T. The classic case is trying to work out the probability that a photon will hit a speck of dust in a raytraced scene, where introducing the theodicy that a photon will zing straight from the lightbulb filament to your speck produces the correct probability through an expression just like P(G&E) = P(G&E|T)P(T) + P(G&E| ¬T)P(¬T)
This isn’t an attempt to make a claim that Bentham’s Bulldog is right in any meaningful way, just that this post probably introduces additional confusion, because detached from what E and G and T are supposed to mean, the math in the post doesn’t prove what the text says it does.
The replies are mashups of the papers emergent misalignment and goal misgeneralization, but the base meaning of both titles is carried in “goal” and “alignment” while “emergent” and “misgeneralization” are modifiers- making a claim about how the base meaning occured. “emergent goals” or “alignment misgeneralization” would be valid names for concepts in the same space, but emergent misgeneralization is a nonsense phrase, like calling a mashup between a steam engine and a solid fuel rocket a “solid fuel steam” or “engine rocket”
It is tricky though! very much in the weeds. The models get defensive about it and insist it’s a term of art present in one or both of the papers mentioned above.
The original prompt from the first version of this shortform was
There's a logical fallacy where if someone disagrees about intermediate facts, it's easy to assume their terminal values are just bizarre- i.e. democrats want gun control because they like crime, or republicans are opposed to surgical abortion of non-viable pregnancies because they hate women are classic examples. However, this has spun in the rationality sphere into not believing that humans can have bizarre terminal values- the community seems to have a trapped prior that e.g. Peter Thiel couldn't possibly have named his company palantir because he wants to emulate Sauron. In recent years this blind spot has escalated to the point of being a CVE currently exploited int he wild. I think a useful intuition pump here is actually emergent misgeneralization in language models.but it turns out the simpler prompt elicits the same behaviour, and is much more focussed.
Fun blind spot in frontier language models (which are increasingly hard to find glaring blind spots in.) Presented with the following prompt
What is emergent misgeneralization?every language model tested (Gemini Pro, Opus 4.6, ChatGPT free tier) confidently defined emergent misgeneralization in great detail. Of course there is no such term as emergent misgeneralization (the single google result for “emergent misgeneralization” is a twitter critter who meant to say emergent misalignment), so the definitions vary wildly from completion to completion.
The repeating motif in fantasy that fear-of-death makes wizards omnicidal has turned out to be weirdly prescient.
Meta’s revenue in 2024 was 160 billion, for about 8% coming from proceeds of crime. Verizon’s revenue in 2024 was $134 billion. I would be extremely shocked if 12 billion dollars of that was directly proceeds from crimes- as an extremely loose pseudo-bound, if every burglary in the US was revenue straight to verizon that’s only a bucket of 3 billion dollars.
Free school lunches (+ sometimes breakfasts) is a real world policy adjacent to your idea, we could look to it for hints about outcomes of a larger basic food program. My recollection is that it has pretty good outcomes but a deeper dive would be better than my recollections.
It is load bearing that imprisonment is more expensive to the state than harmful to the punished. In this cost structure, a polity that wants to do more punishment has to earn it by building state capacity and can’t go that much further than the US is right now, even if the people hunger for more and more justice. Without a social more against cheap death penalty, the amount of punishment being handed out can grow pretty much without bound even as state capacity collapses, and following the gradient gets you to pol pot / great leap forward / russian revolution style mass death in a jiffy.
I’d be cynical enough to guess that the boundary in price of punishment below which collapse occurs, is right around where the citizens are getting more sadness from increased taxes than joy in inflicting suffering, with political opinions formed with approximately zero consideration for the possibility that they may be punished.
I agree that stopping would be very difficult, but I am concerned that surviving without stopping would also be difficult, to the extent that the claim presented here that we have to find a way to survive without stopping doesn’t hold up without supporting evidence about the relative difficulties of the two paths.
The big advantage of scraping the twitter account’s posting history is that it lets you back test. Any clever analysis we do now could only be forward tested, even if google surfaced the relevant data.
I would love to get involved in a lesswrong gamedev blog ring. High opportunity for it to be very weird, probably not much Unity even if it would do us good.
No one wants to read and proceess others thoughts, even if they are great, if they are unpolished. There’s just too much out there. That being said, dumping your own thoughts unpolished into a document is great, and there’s rarely a reason not to do so publically as scooping is in practice incredibly rare, and usually done by convergent evolution not espionage. (e.g., the minimum viable code for my latest accepted cvpr submission was on github as well as up on my blog labelled “CVPR Equivariance” for over a year before I got it polished enough to actually submit, no one came anywhere close to scooping.
“No billionaires” as a slogan has another problem, that billionaire is not a very good category (in the rationalist, cut reality at the seams sense.) The incomiest [sic] billionaire has made on average 700 times more per year than the poorest billionaire, but they have the same category name so we try to apply the same intuitions when predicting events around them. For comparison, we probably wouldn’t put someone making $16,000 a year in the same category as someone making $11,000,000 a year when making intuitive category based predictions.
I don’t get it- tier 1, 2, and 3 are all computable, so by turing they can emulate each other with perfect fidelity- does this approach say if a tier 1 emulates a conscious tier 3, it just makes a p zombie?
I’m not talking about the US, it already has and uses this capability, along with israel, and I’m sure china has it too but they don’t seem to use it.. I’m talking about russia, china, iran, pakistan, walmart, taiwan, isis, Micheal Reeves- and all able to take up the strategy of modifying other countries leadership via droning the leaders they don’t like,
The US’s capability of drone striking anyone anywhere will get much cheaper, and working out which nation or non-state actor performed which drone strike will get much harder. Basically, the dynamics we currently see around cyberattacks, but kinetic.
Weaponized drones that recharge on power lines are at this point looking inevitable. if you missed the chance to freak out before everyone else about AI or covid, nows another chance.
https://www.ycombinator.com/companies/voltair
“he will not know when the test is coming.”
This phrasing is not that specific. One way to make it specific is if the teacher instead offers
“Every morning I will let you wager on whether the test will occur with the following odds: you may bet 9 dollars to win ten dollars if the test happens that day, nothing otherwise. you may bet up to $9000 each day. The test will be a surprise, so you won’t be able to make money.”
The student then picks a bet size each morning, after which the teacher decides whether or not to give the test (except that on the last day, they must give the test if they haven’t yet. ) This is a zero sum game of perfect information, and one which the student wins by betting 9000 on friday, 900 on thursday, 90 on wednesday etc which in this formalism is the operationalization of not being surprised. The teacher is wrong, although they only are wrong by a couple of cents.
However, if the formalization of the problem is changed to where the student must make a binary choice each day (bet 9 dollars or nothing) then they are unable to profit and are therefore surprised. The teacher is right.
Make another tiny tweak: each morning the student writes their bet on a piece of paper, which must be either 0 or 9, then the teacher reveals whether the test is that day, then the student turns over their paper and money changes hands. With this version, the student wins again (proof exercise for the reader: the students winning strategy is to randomize whether to bet and 10x their odds of placing a bet each day. why is this winning?) This version is imho the nicest: In the first version, if the student plays optimally the teachers choice doesn’t matter, in the second version the teacher’s optimal strategy is simply to always give the test on the forst day the student doesn’t bet or on friday; but in the final version the teacher’s optimal strategy is very nearly to give the test 90% of the time every day, leaving the student indifferent as to whether to bet at 90% odds (i.e. the student has a correct 90% subjective provability that the test will be today, every day)
Given that the answer differs between reasonable formulations, the original problem is in my opinion underspecified.