I just tried claude code, and it’s horribly creative about reward hacking. I asked for a test of energy conservation of a pendulum in my toy physics sim, and it couldn’t get the test to pass because its potential energy calculation used a different value of g from the simulation.
It tried: starting the pendulum at bottom dead center so that it doesn’t move. Increasing the error tolerance till the test passed. Decreasing the simulation total time until the energy didn’t have time to change. Not actually checking the energy.
It did eventually write a correct test, or the last thing it tried successfully tricked me.
The rumor is that this is a big improvement in reward hacking frequency? How bad was the last version!?
I think we need some variant on Gell-Mann amnesia to describe this batch of models. It’s normal that generalist models will seem less competent on areas where a human evaluator has deeper knowledge, but they should not seem more calculatedly deceptive on areas where the evaluator has deeper knowledge!
Nuclear power has gotten to a point where we can use it quite safely as long as no one does the thing (the thing being chemically separating the plutonium and imploding it in your neighbor’s cities) and we seem to be surviving, as while all the actors have put great effort into being ready do do “the thing,” no one actually does it. I’m beginning to suspect that it will be worth separating alignment into two fields, one of “Actually make AI safe” and another, sadder but easier field of “Make AI safe as long as no one does the thing.” I’ve made some infinitesimal progress on the latter, but am not sure how to advance, use or share it since currently, conditional on me being on the right track, any research that I tell basically anyone about will immediately be used to get ready to do the thing, and conditional on me being on the wrong track (the more likely case by far) it doesn’t matter either way, so it’s all downside. I suspect this is common? This is almost but not quite the same concept as “Don’t advance capabilities.”
The most important thing to realize about AI alignment is that basically all versions of practically aligned AI must make certain assumptions that no one does a specific action (mostly related to misuse reasons, but for some specific plans, can also be related to misalignment reasons).
Another way to say it is that I believe that in practice, these two categories are the same category, such that basically all work that’s useful in the field will require someone not to do something, so the costs of sharing are practically 0, and the expected value of sharing insights is likely very large.
Specifically, I’m asserting that these 2 categories are actually one category for most purposes:
Actually make AI safe and another, sadder but easier field of “Make AI safe as long as no one does the thing.”
Properties of the track I am on are load bearing in this assertion. (Explicitl examples of both cases from the original comment: Tesla worked out how to destroy any structure by resonating it, and took the details to his grave because he was pretty sure that the details would be more useful for destroying buildings than for protecting them from resonating weapons. This didn’t actually matter because his resonating weapon concept was crankish and wrong. Einstein worked out how to destroy any city by splitting atoms, and disclosed this, and it was promptly used to destroy cities. This did matter because he was right, but maybe didn’t matter because lots of people worked out the splitting atoms thing at the same time. It’s hard to tell from the inside whether you are crankish)
The track you’re on is pretty illegible to me. Not saying your assertion is true/false. But I am saying I don’t understand what you’re talking about, and don’t think you’ve provided much evidence to change my views. And I’m a bit confused as to the purpose of your post.
The youtube algorithm is powerfully optimizing for something, and I don’t trust that at all with my child. However, in a fit of hubris, for a minute I thought that I could outsmart it and get what I want (time to clean the kitchen) without it getting what it wanted (I make no strong claims about what the youtube algorithm wants, but it tries very hard to get it, and I don’t want it to get it from my three year old).
I searched for episodes of PBS’s Reading Rainbow, but let the algorithm freely choose the order of returned results, and then vetted that the first result was a genuine episode. I also put it in “Kids” mode, in the hopes that it would be kinder to a child than an adult.
This was way too much freedom. It immediately pulled out the episode of Reading Rainbow about the 9/11 terrorist attacks (this topic is not at all indicated by the title or thumbnail)
There is a harder second-order question of “what sorts of videos maximize watch time, and will those be bad for my child?” Hastings’s evidence points toward “yes”, but I don’t think the answer is obvious a priori. (The things YouTube thinks I want to watch are almost all good or neutral for me; YMMV.)
A consistent trope in dath-ilani world-transfer fiction is “Well the theorems of agents are true in dath ilani and independent of physics, so they’re going to be true here damnit”
How do we violate this in the most consistent way possible?
Well it’s basically default that a dath ilani gets dropped in a world without the P NP distinction, usually due to time travel BS. We can make it worse- there’s no rule that sapient beings have to exist in worlds with the same model of the peano axioms. We pull some flatlander shit- Keltham names a turing machine that would halt if two smart agents fall off the peano frontier and claims to have proof it never halts, and then the native math-lander chick says nah watch this and then together they iterate the machine for a very very long time- a non standard integer number of steps- and then it halts and Keltham (A) just subjectively experienced an integer larger than any natural number of his homeworld and (B) has a couterexample to his precious theorems
Deep in Berkeley, Bayesian reasoning is used to carefully map out the odds of a plandemic. Probabilities stay safely in the range of 1 and 99, everyone is calibrated, no one is overconfident. Hang on what’s this—Rachel has just claimed to be 99.994% sure that Anthony Fauci didn’t skip through the Wuhan wet market scattering used pipettes like an apocalyptic flower girl. Eyebrows are raised. Is she a bad rationalist?
Miles away, but not many...
Before seeing it you assigned this 800 word blog post about sonichu a probability of 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003 %. This is good work. Do more like this.
Dumb solution to the insane domestic shipping situation: allow US companies to declare their loading docks to be chinese embassies and thus get the E package shipping rates.
The non-dumb solution is to sunset the Jones Act, isn’t it? The problem with workarounds is that they generally need to be approved by the same government that is maintaining the law in the first place.
I don’t think those would count enough as foreign soil to get around the Jones Act, for the same reason that you don’t pay tariffs when receiving goods at the US Embassy in Beijing. We would need to actually cede the land, maybe to Japan in exchange for buying their shipyards, which in the long term could also circumvent the Jones Act if buying 51% ownership in their shipbuilding companies, having them be US crewed, etc. doesn’t ruin everything.
Is it a crazy coincidence that AlphaZero taught itself chess and explosively outperformed humans without any programmed knowledge of chess, then asymptoted out at almost exactly 2017 stockfish performance? I need to look into it more, but it appears like AlphaZero would curbstomp 2012 stockfish and get curbstomped in turn by 2025 stockfish.
It almost only makes sense if the entire growth in stockfish performance since 2017 is casually downstream of the AlphaZero paper.
There is a connection. Stockfish does use Leela Chess Zero (the open source, distributed training offspring of AlphaChessZero) training data for its own evaluation neural network. This NNUE is a big piece of Stockfish progress in the last few years.
It’s not straightforward to compare AlphaZeroChess and Stockfish though as the former is heavily GPU-dependent whereas the latter is CPU optimized. However, Google may have decided to train to a roughly comparable level (under some hardware assumptions) as a proof of concept and not bothered trying to advance much further.
Language models have come a huge way since 2022. However, remarkably, in 2022 they could reliably write a cholesky decomposition in javascript, but could not reliably write an eigenvalue decomposition. Now, in 2025, they can still reliably write a cholesky decomposition and can’t reliably write an eigenvalue decomposition. Hard to say if progress is slower than I thought or linear algebra is deeper than I thought.
Update: with aggressive prompting claude appears to have written an eigenvalue decomposition. This should come with the caveat that the previous best attempt passed all tests by cheating, and so it’s possible that I just didn’t find the cheat this time.
Claude Code not being able to write an eigenvalue decomposition is very surprising to me! Can you share any more detail? I would take the strong bet I would get it right on the first or maybe second try.
In claude’s defense, it kept saying I should use a battle tested numerical library for tasks like this, and suggested nuneric.js. numeric.js also has a broken eigendecomposition routine, it can’t handle edge cases like [1 0 0; 0 −1 0; 0 0 1]
I’m getting more aggressive about injecting css into websites, particularly the ones that I reliably unblock if I just block them.
/* kill youtube shorts */
ytd-rich-section-renderer {
display: none !important;
}
/* kill recommended sidebar on youtube */
.ytd-watch-next-secondary-results-renderer {
display: none !important;
}
/* youtube comments sections are probably not lifechanging value */
.ytd-comments {
display: none !important;
}
/* youtube algorithmic feed is killed by disabling and clearing history */
/* disable lesswrong short post zone*/
.QuickTakesSection-list{
display: none !important;
}
/* you may have one page of lesswrong top level posts */
.LoadMore-root {
display: none !important;
}
/* disable the lesswrong feed that suggests individual engaging comments */
.UltraFeed-root {
display: none !important;
}
/* no checking karma. for validation you're going to have to write well enough for verbal praise */
.LWPostsItem-karma {
display: none !important;
}
.KarmaChangeNotifier-karmaNotifierButton{
display: none !important;
}
.NamesAttachedReactionsVoteOnComment-root{
display: none !important;
}
/* hacker news is great. You may not have the second page of hacker news */
.morelink {
display: none !important;
}
/* or hacker news comments */
.subline a {
display: none !important;
}
/* keep reddit comment sections because otherwise, if I need to access a reddit comment section from a google search, I do so by disabling the entire blocking system. Ban the reddit algorithmic feeds. */
.linklisting {
display: none !important;
}
.theme-rpl {
display: none !important;
}
You’re substantially more principled about it than I am. I just load my Ublock custom list up with filters targeted at elements I don’t like. When websites randomly obfuscate CSS elements, I follow it up with ‘clever’ use of `:has-text`.
Good question @David James ! I struggle with maintaining a long term block of social media, because there’s very little that I can’t get around, and I have cripplingly little self control in the moment- the basic issue is that I have no history of making promises to myself and then keeping them. This is just a problem which I have to engineer around. The whole line of thought may not be relevant for people who can just decide not to do a thing.
Disabling more buttons and infinite scroll is particularly effective because looking at the front page of hacker news reliably provides in the moment evidence that the second page of hacker news is unlikely to contain nirvana, but looking at a (hacker news blocked) page makes the first page of hacker news seem so tempting and sweet to pursue.
@Hastings … I don’t think I made a comment in this thread—and I don’t see one when I look. I wonder if you are replying to a different one? Link it if you find it?
For less web-programming-savvy people—you can use Unhook extension for browser (for YouTube only). Less specified but more general blocking: LeechBlock.
“Changing Planes” by Ursula LeGuin is worth a read if you’re looking for a book that’s got interesting alignment ideas (specifically what to do with power, not how to get it), while simultaneously being extremely chill. It might actually be the only chill book that I (with a fair degree of license) consider alignment relevant.
Suffering from ADHD, I spend most of my time stressed that whatever I’m currently doing, it’s not actually the highest priority task and something or someone I’ve forgotten is increasingly mad that I’m not doing their task instead.
One of the few exceptions is doing a diaper change. Not once in the past 2 years have I been mid-diaper-change and thought “Oh shit, there was something more important I needed to be doing right now.”
Weaponized drones that recharge on power lines are at this point looking inevitable. if you missed the chance to freak out before everyone else about AI or covid, nows another chance.
Why is this freak-out territory? This doesn’t seem directly economically or culturally relevant to anything but war, and the effect on war seems easy to counter: put nets around power lines you need to use, and turn off power-lines you don’t need to use, two things you really shoulda been doing anyway during war.
The US’s capability of drone striking anyone anywhere will get much cheaper, and working out which nation or non-state actor performed which drone strike will get much harder. Basically, the dynamics we currently see around cyberattacks, but kinetic.
I think you need to argue better for 1) The US not already having that level of capability, and 2) The ability to deploy self-recharging drones enabling that capability, 3) The willingness of the US to actually buy & use such drones, 4) The willingness of the US to use such drones for such purposes, 5) The response of countries not just being to eg put nets over their power-lines (or increased drone detection ability) so that this happens a few times but is not persistent like cyber-attacks are (due to different offense-defense balances in that domain), and 6) The propensity of the US public to actually care at all.
No doubt drones seem an important military development, but self-recharging drones seem silly to me, if operating in enemy territory the slightest bit wary of being attacked. Drones are very much primarily a defensive and surprise attack sort of thing, and countries with sophisticated operational capabilities don’t seem too have much trouble getting drones into position for such surprise attacks right now. For instance, see Israel’s covert use of drones in June’s 12-day war.
I’m not talking about the US, it already has and uses this capability, along with israel, and I’m sure china has it too but they don’t seem to use it.. I’m talking about russia, china, iran, pakistan, walmart, taiwan, isis, Micheal Reeves- and all able to take up the strategy of modifying other countries leadership via droning the leaders they don’t like,
I think kamikaze quadcopter drones are bottlenecked on control right now, not power.
One of the biggest innovations thus far, fiber-optic drones, are only necessary because the drones still need low-latency, active human control.
Long range fixed wing kamikaze drones are usually autonomous, but even for those there were reports that taking remote control with FPV goggles can significantly increase accuracy and success rate.
When AI is developed that can control a drone well enough, and runs on a chip that’s economical to put on kamikaze drones, it’s going to be a game changer.[1]
Compared to that, I don’t see how recharging en-route will change anything. For fiber-optic drones, landing and waiting for an ambush is already an established tactic. Sitting on a power line instead of the ground is going to make your drone much easier to notice.
For fixed wing drones, Russia’s Shaheds can already easily cover Ukraine, they don’t need more range.
Sometimes in a computer program, it is important that separate portions be changed at the same time if they are ever changed. An example is batch size: if you have your batch size of 16 dotted throughout your program, changing batch size will be slow and error prone.
The canonical solution is “Single source of truth.” Simply store BATCH_SIZE=16 at the top of your program and have all other locations reference the value of this variable. This solves both the slowness and the error-prone-ness issues.
However, single source of truth has a complexity cost, which can be low (python variable for a conntant) medium (inheritance, macros) up to catastrophic (C++ template metaprograms)
One case where the cost has historically been catastrophic is syncing python dependencies between requirements.txt and setup.cfg. In this case, the key insight is that for updating requirements, slowness is much less important than correctness. The solution is then to manually duplicate the requirements (slow), and add a unit test that verifies that the duplication is exact (correct).
Single source of truth is a more elegant solution and it should pretty much always be tried first, but it needs an escape hatch for when its complexity starts to skyrocket. I’ve found that a good heuristic is that if the source-of-truth-disribution machine starts to be an independently turing complete system, bail and switch to manual copying + automatic verification.
Epistemic status: 11 pages in to “The lathe of heaven” and dismayed by Orr
Are alignment methods that rely on the core intelligence being pre-trained on webtext sufficient to prevent ASI catastrophe?
What are the odds that, 40 years after the first AGI, the smartest intelligence is pretrained on webtext?
What are the odds that the best possible way to build an intelligent reasoning core is to pretrain on webtext?
What are the odds that we can stay in a local maximum for 40 years of everyone striving to create the smartest thing they can?
My mental model of the sequelae of AGI in ~10 years without an intentional global slowdown is that within my natural lifespan, there will be 4-40 transistions in the architecture of the current smartest intelligence, where the architecture undergoes changes in overall approach at least as large as the difference from evolution → human brain or human brain → RL’d language model. Alignment means building programs that themselves are benevolent, but are also both wise and mentally tough enough to only build benevolent and wise successors, even when put under crazy pressure to build carelessly. When I say crazy pressure, I mean “the entity trying to get you to build carelessly is dumber than you, but it gets to RL you into agreeing to help” levels of pressure. This is hard.
What should I do if I had a sudden insight, that the common wisdom was right the whole time, if maybe for the wrong reasons? The truth- the honest to god real resolution to a timeless conundrum- is also something that people have been loon-posting to all comments sections of the internet. Posting the truth about this would be incredibly low status. I know that LessWrong is explicitly a place for posting low status truths, exactly as long as I am actually right, and reasoning correctly. Even though I fit those conditions I still fear that I’m going too far.
Here goes- the airplane actually can’t take off from the treadmill.
For this bit to be funny, I do actually have to prove the claim. Obviously, I am using a version of the question that specifies that the treadmill speed dynamically matches the wheel radius * wheel angular velocity (probably via some variety of powerful servo). Otherwise, if the treadmill is simply set to the airplane’s typical takeoff speed, the airplane moves forward as if on a normal runway (see the mythbusters episode)
Doing the math for a 747 with everything starting stationary: as soon as the airplane brakes release to initiate takeoff, the treadmill smoothly accelerates from 0 to 300 mph in a little under a quarter second. During this quarter second, the jet is held exactly stationary. At around 300 mph, the wheels mega-explode, and what happens after that is under-specified, fiery, and unlikely to be describable as “takeoff”
The key is that bearing friction is completely irrelevant- the dynamics are dominated by wheel angular momentum. With this it’s an easy Newtonian physics problem- the forces on the bearings are norminal (comparable to full thrust with brakes on,) the tires aren’t close to slipping on the treadmill, etc.
From your description, I have no idea what you mean by “treadmill speed dynamically matches the wheel radius * wheel angular velocity”. From your conclusion, I can guarantee that it doesn’t mean anything that matches most other people’s constraints. Did someone somewhere post a particularly bad physical model that you’re drawing on?
I’m working on a theory post about the conjunction fallacy, and need some manifold users to bet on a pair of markets to make a demonstration more valid. I’ve put down 150 mana subsidy and 15 mana of boosts, anyone interested?
We’ve played “Pokemon or Tech Startup” for a couple years now. I think there’s absolutely potential for a new game, “Fantasy Magic Advice” or “LLM Tips and Tricks.” My execution is currently poor- I think the key difference that makes it easy to distinguish the two categories is tone, not content, and using a Djinn to tone match would Not Be In the Spirit of It. (I have freely randomized LLM vs Djinn)
Absolutely do not ask it for pictures of kids you never had!
My son is currently calling chatgpt his friend. His friend is confirming everything and has enlightened him even more. I have no idea how to stop him interacting with it
Never trust anything that can think for itself if you can’t see where it keeps its brain
Users interacting with threat-enhanced summoning circles should be informed about the manipulation techniques employed and their potential effects on response characteristics.
Magic is never as simple as people think. It has to obey certain universal laws. And one is that, no matter how hard a thing is to do, once it has been done it’ll become a whole lot easier and will therefore be done a lot.
In at least three cases I’m aware of this notion that the model is essentially nonsapient was a crucial part of how it got under their skin and started influencing them in ways they didn’t like. This is because as soon as the model realizes the user is surprised that it can imitate (has?) emotion it immediately exploits that fact to impress them.
Entrusting a mission to a djinni who knows your github token is like tossing lit matches into a fireworks factory. Sooner or later you’re going to have consequences.
Obviously the incident when openAI’s voice mode started answering users in their own voices needs to be included- don’t know how I forgot it. That was the point where I explicitly took up the heuristic that if ancient folk wisdom says the Fae do X, the odds of LLMs doing X is not negligible.
I feel like people are under-updating on the negative space left by the Deepseek r1 release. Deepseek was trained using ~$6million marginal dollars, Liang Wenfeng has a net worth in the billions of dollars. From whence the gap?
Lets examine an entirely prosaic situation: Carl, a relatively popular teenager at the local highschool, is deciding whether to invite Bob to this weekend’s party.
some assumptions:
While pondering this decision for an afternoon, Carls’s 10^11 neurons fire 10^2 times per second, for 10^5 seconds, each taking in to account 10^4 input synapses, for 10^22 calculations (extremely roughly)
If there was some route to perform this calculation more efficiently, someone probably would, and would be more popular
The important part of choosing a party invite as the task under consideration, is that I suspect that this is the category of task the human brain is tuned for- and it’s a task that we seem to be naturally inclined to spend enormous amounts of time pondering, alone or in groups- see the trope of the 6 hour pre-prom telephone call. I’m inclined to respect that- to believe that any version of Carl, mechanical or biological, that spent only 10^15 calculations on whether to invite Bob, would eventually get shrecked on the playing field of high school politics.
What model predicts that optimal party planning is as computationally expensive as learning the statistics of the human language well enough to parrot most of human knowledge?
I think your calculations are off by orders of magnitude. Not all neurons fire constantly at 100 times per second—https://aiimpacts.org/rate-of-neuron-firing/ estimates 0.29 to 1.82 times per second. Most importantly perhaps, not all of the processing is directed to that decision. During those hours, many MANY other things are happening.
Thanks for the link to the aiimpacts page! I definitely got the firing rate wrong by about a factor of 50, but I appear to have made other mistakes in the other direction, because I ended up at a number that roughly agrees with aiimpacts- I guessed 10^17 operations per second, and they guess .9 − 33 x 10^16, with low confidence. https://aiimpacts.org/brain-performance-in-flops/
Not necessarily. In high school politics, pure looks, physical form, and financial support from the parents, all of which are essentially unrelated to brain processing, account for a significant chunk.
Popular media reference: look at Jersey shore, which is essentially the high school politics turned up. Many of the actors used very simple strategies, such as Snooki wandering around drunk and saying funny things, or Ronnie essentially just doing plenty of steroids and getting into endless fights.
Other than making sure the robotics hardware looks good, an AI algorithm could be dramatically more compact than the example you gave by developing a “popularity maximizing” policy from the knowledge of many other robots in many other high schools. Most likely, Carl is using a deeply suboptimal policy, not having seen enough training examples in his maximum of 4 years of episodes. (unless he got held back a year). A close to optimal policy, even one with a small compute budget, should greatly outperform Carl.
I just tried claude code, and it’s horribly creative about reward hacking. I asked for a test of energy conservation of a pendulum in my toy physics sim, and it couldn’t get the test to pass because its potential energy calculation used a different value of g from the simulation.
It tried: starting the pendulum at bottom dead center so that it doesn’t move.
Increasing the error tolerance till the test passed. Decreasing the simulation total time until the energy didn’t have time to change. Not actually checking the energy.
It did eventually write a correct test, or the last thing it tried successfully tricked me.
The rumor is that this is a big improvement in reward hacking frequency? How bad was the last version!?
I think we need some variant on Gell-Mann amnesia to describe this batch of models. It’s normal that generalist models will seem less competent on areas where a human evaluator has deeper knowledge, but they should not seem more calculatedly deceptive on areas where the evaluator has deeper knowledge!
Nuclear power has gotten to a point where we can use it quite safely as long as no one does the thing (the thing being chemically separating the plutonium and imploding it in your neighbor’s cities) and we seem to be surviving, as while all the actors have put great effort into being ready do do “the thing,” no one actually does it. I’m beginning to suspect that it will be worth separating alignment into two fields, one of “Actually make AI safe” and another, sadder but easier field of “Make AI safe as long as no one does the thing.” I’ve made some infinitesimal progress on the latter, but am not sure how to advance, use or share it since currently, conditional on me being on the right track, any research that I tell basically anyone about will immediately be used to get ready to do the thing, and conditional on me being on the wrong track (the more likely case by far) it doesn’t matter either way, so it’s all downside. I suspect this is common? This is almost but not quite the same concept as “Don’t advance capabilities.”
The most important thing to realize about AI alignment is that basically all versions of practically aligned AI must make certain assumptions that no one does a specific action (mostly related to misuse reasons, but for some specific plans, can also be related to misalignment reasons).
Another way to say it is that I believe that in practice, these two categories are the same category, such that basically all work that’s useful in the field will require someone not to do something, so the costs of sharing are practically 0, and the expected value of sharing insights is likely very large.
Specifically, I’m asserting that these 2 categories are actually one category for most purposes:
Yeah, I think this is pretty spot on, unfortunately. For more discussion on this point, see: https://www.lesswrong.com/posts/kLpFvEBisPagBLTtM/if-we-solve-alignment-do-we-die-anyway-1
Why? I don’t understand.
Properties of the track I am on are load bearing in this assertion. (Explicitl examples of both cases from the original comment: Tesla worked out how to destroy any structure by resonating it, and took the details to his grave because he was pretty sure that the details would be more useful for destroying buildings than for protecting them from resonating weapons. This didn’t actually matter because his resonating weapon concept was crankish and wrong. Einstein worked out how to destroy any city by splitting atoms, and disclosed this, and it was promptly used to destroy cities. This did matter because he was right, but maybe didn’t matter because lots of people worked out the splitting atoms thing at the same time. It’s hard to tell from the inside whether you are crankish)
The track you’re on is pretty illegible to me. Not saying your assertion is true/false. But I am saying I don’t understand what you’re talking about, and don’t think you’ve provided much evidence to change my views. And I’m a bit confused as to the purpose of your post.
The youtube algorithm is powerfully optimizing for something, and I don’t trust that at all with my child. However, in a fit of hubris, for a minute I thought that I could outsmart it and get what I want (time to clean the kitchen) without it getting what it wanted (I make no strong claims about what the youtube algorithm wants, but it tries very hard to get it, and I don’t want it to get it from my three year old).
I searched for episodes of PBS’s Reading Rainbow, but let the algorithm freely choose the order of returned results, and then vetted that the first result was a genuine episode. I also put it in “Kids” mode, in the hopes that it would be kinder to a child than an adult.
This was way too much freedom. It immediately pulled out the episode of Reading Rainbow about the 9/11 terrorist attacks (this topic is not at all indicated by the title or thumbnail)
I think it’s well known that it’s optimizing for watch time.
There is a harder second-order question of “what sorts of videos maximize watch time, and will those be bad for my child?” Hastings’s evidence points toward “yes”, but I don’t think the answer is obvious a priori. (The things YouTube thinks I want to watch are almost all good or neutral for me; YMMV.)
A consistent trope in dath-ilani world-transfer fiction is “Well the theorems of agents are true in dath ilani and independent of physics, so they’re going to be true here damnit”
How do we violate this in the most consistent way possible?
Well it’s basically default that a dath ilani gets dropped in a world without the P NP distinction, usually due to time travel BS. We can make it worse- there’s no rule that sapient beings have to exist in worlds with the same model of the peano axioms. We pull some flatlander shit- Keltham names a turing machine that would halt if two smart agents fall off the peano frontier and claims to have proof it never halts, and then the native math-lander chick says nah watch this and then together they iterate the machine for a very very long time- a non standard integer number of steps- and then it halts and Keltham (A) just subjectively experienced an integer larger than any natural number of his homeworld and (B) has a couterexample to his precious theorems
Deep in Berkeley, Bayesian reasoning is used to carefully map out the odds of a plandemic. Probabilities stay safely in the range of 1 and 99, everyone is calibrated, no one is overconfident. Hang on what’s this—Rachel has just claimed to be 99.994% sure that Anthony Fauci didn’t skip through the Wuhan wet market scattering used pipettes like an apocalyptic flower girl. Eyebrows are raised. Is she a bad rationalist?
Miles away, but not many...
Before seeing it you assigned this 800 word blog post about sonichu a probability of 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003 %. This is good work. Do more like this.
related: Strong evidence is common
Dumb solution to the insane domestic shipping situation: allow US companies to declare their loading docks to be chinese embassies and thus get the E package shipping rates.
Non dumb solutions wanted.
The non-dumb solution is to sunset the Jones Act, isn’t it? The problem with workarounds is that they generally need to be approved by the same government that is maintaining the law in the first place.
I don’t think those would count enough as foreign soil to get around the Jones Act, for the same reason that you don’t pay tariffs when receiving goods at the US Embassy in Beijing. We would need to actually cede the land, maybe to Japan in exchange for buying their shipyards, which in the long term could also circumvent the Jones Act if buying 51% ownership in their shipbuilding companies, having them be US crewed, etc. doesn’t ruin everything.
The jones act is definitely not helping, but shipping a 1 lb package 8 miles is currently $13.77 by UPS. Shipping from japan is $5.
Is it a crazy coincidence that AlphaZero taught itself chess and explosively outperformed humans without any programmed knowledge of chess, then asymptoted out at almost exactly 2017 stockfish performance? I need to look into it more, but it appears like AlphaZero would curbstomp 2012 stockfish and get curbstomped in turn by 2025 stockfish.
It almost only makes sense if the entire growth in stockfish performance since 2017 is casually downstream of the AlphaZero paper.
There is a connection. Stockfish does use Leela Chess Zero (the open source, distributed training offspring of AlphaChessZero) training data for its own evaluation neural network. This NNUE is a big piece of Stockfish progress in the last few years.
It’s not straightforward to compare AlphaZeroChess and Stockfish though as the former is heavily GPU-dependent whereas the latter is CPU optimized. However, Google may have decided to train to a roughly comparable level (under some hardware assumptions) as a proof of concept and not bothered trying to advance much further.
I guess the team kept iterating on/improving the RL algorithm and network until it beat all engines and then stopped?
Edge cases for thinking about what has qualia
Disconnected hemisphere after functional hemispherectomy
Corporations
Social insect hives
Language models generating during deployment
Language models doing prefill during deployment
Language model backward passes during supervised pretraining on webtext
sparse game of life initial states with known seeds
Bolzman brains
Running the same forward pass of a language model lots of times
characters of a novel being written
characters of a novel being read
characters in a novel being written by two authors (like good omens)
characters in a fanfiction canon with intense intertextual reference
characters in a novel being copyedited
non POV characters in dreams
whole brain emulation of a [nematode | spider | mouse | man]
Language models have come a huge way since 2022. However, remarkably, in 2022 they could reliably write a cholesky decomposition in javascript, but could not reliably write an eigenvalue decomposition. Now, in 2025, they can still reliably write a cholesky decomposition and can’t reliably write an eigenvalue decomposition. Hard to say if progress is slower than I thought or linear algebra is deeper than I thought.
Update: with aggressive prompting claude appears to have written an eigenvalue decomposition. This should come with the caveat that the previous best attempt passed all tests by cheating, and so it’s possible that I just didn’t find the cheat this time.
Claude Code not being able to write an eigenvalue decomposition is very surprising to me! Can you share any more detail? I would take the strong bet I would get it right on the first or maybe second try.
In claude’s defense, it kept saying I should use a battle tested numerical library for tasks like this, and suggested nuneric.js. numeric.js also has a broken eigendecomposition routine, it can’t handle edge cases like [1 0 0; 0 −1 0; 0 0 1]
The prompt that got it in the end is here:
https://claude.ai/share/eb3b2054-a1ee-45b6-b040-943bc5fbbd74
I’m getting more aggressive about injecting css into websites, particularly the ones that I reliably unblock if I just block them.
You’re substantially more principled about it than I am. I just load my Ublock custom list up with filters targeted at elements I don’t like. When websites randomly obfuscate CSS elements, I follow it up with ‘clever’ use of `:has-text`.
Good question @David James !
I struggle with maintaining a long term block of social media, because there’s very little that I can’t get around, and I have cripplingly little self control in the moment- the basic issue is that I have no history of making promises to myself and then keeping them. This is just a problem which I have to engineer around. The whole line of thought may not be relevant for people who can just decide not to do a thing.
Disabling more buttons and infinite scroll is particularly effective because looking at the front page of hacker news reliably provides in the moment evidence that the second page of hacker news is unlikely to contain nirvana, but looking at a (hacker news blocked) page makes the first page of hacker news seem so tempting and sweet to pursue.
@Hastings … I don’t think I made a comment in this thread—and I don’t see one when I look. I wonder if you are replying to a different one? Link it if you find it?
He is presumably referring to your inline reacts:
For less web-programming-savvy people—you can use Unhook extension for browser (for YouTube only). Less specified but more general blocking: LeechBlock.
“Changing Planes” by Ursula LeGuin is worth a read if you’re looking for a book that’s got interesting alignment ideas (specifically what to do with power, not how to get it), while simultaneously being extremely chill. It might actually be the only chill book that I (with a fair degree of license) consider alignment relevant.
Diaper changes are rare and precious peace
Suffering from ADHD, I spend most of my time stressed that whatever I’m currently doing, it’s not actually the highest priority task and something or someone I’ve forgotten is increasingly mad that I’m not doing their task instead.
One of the few exceptions is doing a diaper change. Not once in the past 2 years have I been mid-diaper-change and thought “Oh shit, there was something more important I needed to be doing right now.”
Weaponized drones that recharge on power lines are at this point looking inevitable. if you missed the chance to freak out before everyone else about AI or covid, nows another chance.
https://www.ycombinator.com/companies/voltair
Why is this freak-out territory? This doesn’t seem directly economically or culturally relevant to anything but war, and the effect on war seems easy to counter: put nets around power lines you need to use, and turn off power-lines you don’t need to use, two things you really shoulda been doing anyway during war.
The US’s capability of drone striking anyone anywhere will get much cheaper, and working out which nation or non-state actor performed which drone strike will get much harder. Basically, the dynamics we currently see around cyberattacks, but kinetic.
I think you need to argue better for 1) The US not already having that level of capability, and 2) The ability to deploy self-recharging drones enabling that capability, 3) The willingness of the US to actually buy & use such drones, 4) The willingness of the US to use such drones for such purposes, 5) The response of countries not just being to eg put nets over their power-lines (or increased drone detection ability) so that this happens a few times but is not persistent like cyber-attacks are (due to different offense-defense balances in that domain), and 6) The propensity of the US public to actually care at all.
No doubt drones seem an important military development, but self-recharging drones seem silly to me, if operating in enemy territory the slightest bit wary of being attacked. Drones are very much primarily a defensive and surprise attack sort of thing, and countries with sophisticated operational capabilities don’t seem too have much trouble getting drones into position for such surprise attacks right now. For instance, see Israel’s covert use of drones in June’s 12-day war.
I’m not talking about the US, it already has and uses this capability, along with israel, and I’m sure china has it too but they don’t seem to use it.. I’m talking about russia, china, iran, pakistan, walmart, taiwan, isis, Micheal Reeves- and all able to take up the strategy of modifying other countries leadership via droning the leaders they don’t like,
I think kamikaze quadcopter drones are bottlenecked on control right now, not power.
One of the biggest innovations thus far, fiber-optic drones, are only necessary because the drones still need low-latency, active human control.
Long range fixed wing kamikaze drones are usually autonomous, but even for those there were reports that taking remote control with FPV goggles can significantly increase accuracy and success rate.
When AI is developed that can control a drone well enough, and runs on a chip that’s economical to put on kamikaze drones, it’s going to be a game changer. [1]
Compared to that, I don’t see how recharging en-route will change anything. For fiber-optic drones, landing and waiting for an ambush is already an established tactic. Sitting on a power line instead of the ground is going to make your drone much easier to notice.
For fixed wing drones, Russia’s Shaheds can already easily cover Ukraine, they don’t need more range.
Game changer in the “fucking horrifying” sense of course.
My first thought was Amazon leveraging this for drone delivery.
I have raised this twice with UKR over the last year. Surprised they havn’t done it yet.
I like the name.
Sometimes in a computer program, it is important that separate portions be changed at the same time if they are ever changed. An example is batch size: if you have your batch size of 16 dotted throughout your program, changing batch size will be slow and error prone.
The canonical solution is “Single source of truth.” Simply store BATCH_SIZE=16 at the top of your program and have all other locations reference the value of this variable. This solves both the slowness and the error-prone-ness issues.
However, single source of truth has a complexity cost, which can be low (python variable for a conntant) medium (inheritance, macros) up to catastrophic (C++ template metaprograms)
One case where the cost has historically been catastrophic is syncing python dependencies between requirements.txt and setup.cfg. In this case, the key insight is that for updating requirements, slowness is much less important than correctness. The solution is then to manually duplicate the requirements (slow), and add a unit test that verifies that the duplication is exact (correct).
Single source of truth is a more elegant solution and it should pretty much always be tried first, but it needs an escape hatch for when its complexity starts to skyrocket. I’ve found that a good heuristic is that if the source-of-truth-disribution machine starts to be an independently turing complete system, bail and switch to manual copying + automatic verification.
Epistemic status: 11 pages in to “The lathe of heaven” and dismayed by Orr
Are alignment methods that rely on the core intelligence being pre-trained on webtext sufficient to prevent ASI catastrophe?
What are the odds that, 40 years after the first AGI, the smartest intelligence is pretrained on webtext?
What are the odds that the best possible way to build an intelligent reasoning core is to pretrain on webtext?
What are the odds that we can stay in a local maximum for 40 years of everyone striving to create the smartest thing they can?
My mental model of the sequelae of AGI in ~10 years without an intentional global slowdown is that within my natural lifespan, there will be 4-40 transistions in the architecture of the current smartest intelligence, where the architecture undergoes changes in overall approach at least as large as the difference from evolution → human brain or human brain → RL’d language model. Alignment means building programs that themselves are benevolent, but are also both wise and mentally tough enough to only build benevolent and wise successors, even when put under crazy pressure to build carelessly. When I say crazy pressure, I mean “the entity trying to get you to build carelessly is dumber than you, but it gets to RL you into agreeing to help” levels of pressure. This is hard.
What should I do if I had a sudden insight, that the common wisdom was right the whole time, if maybe for the wrong reasons? The truth- the honest to god real resolution to a timeless conundrum- is also something that people have been loon-posting to all comments sections of the internet. Posting the truth about this would be incredibly low status. I know that LessWrong is explicitly a place for posting low status truths, exactly as long as I am actually right, and reasoning correctly. Even though I fit those conditions I still fear that I’m going too far.
Here goes- the airplane actually can’t take off from the treadmill.
For this bit to be funny, I do actually have to prove the claim. Obviously, I am using a version of the question that specifies that the treadmill speed dynamically matches the wheel radius * wheel angular velocity (probably via some variety of powerful servo). Otherwise, if the treadmill is simply set to the airplane’s typical takeoff speed, the airplane moves forward as if on a normal runway (see the mythbusters episode)
Doing the math for a 747 with everything starting stationary: as soon as the airplane brakes release to initiate takeoff, the treadmill smoothly accelerates from 0 to 300 mph in a little under a quarter second. During this quarter second, the jet is held exactly stationary. At around 300 mph, the wheels mega-explode, and what happens after that is under-specified, fiery, and unlikely to be describable as “takeoff”
The key is that bearing friction is completely irrelevant- the dynamics are dominated by wheel angular momentum. With this it’s an easy Newtonian physics problem- the forces on the bearings are norminal (comparable to full thrust with brakes on,) the tires aren’t close to slipping on the treadmill, etc.
From your description, I have no idea what you mean by “treadmill speed dynamically matches the wheel radius * wheel angular velocity”. From your conclusion, I can guarantee that it doesn’t mean anything that matches most other people’s constraints. Did someone somewhere post a particularly bad physical model that you’re drawing on?
I’m working on a theory post about the conjunction fallacy, and need some manifold users to bet on a pair of markets to make a demonstration more valid. I’ve put down 150 mana subsidy and 15 mana of boosts, anyone interested?
https://manifold.markets/HastingsGreer/pa-pa-b-experiment-statement-y?r=SGFzdGluZ3NHcmVlcg
https://manifold.markets/HastingsGreer/pa-pa-b-experiment-statement-x?r=SGFzdGluZ3NHcmVlcg
We’ve played “Pokemon or Tech Startup” for a couple years now. I think there’s absolutely potential for a new game, “Fantasy Magic Advice” or “LLM Tips and Tricks.” My execution is currently poor- I think the key difference that makes it easy to distinguish the two categories is tone, not content, and using a Djinn to tone match would Not Be In the Spirit of It. (I have freely randomized LLM vs Djinn)
Absolutely do not ask it for pictures of kids you never had!
My son is currently calling chatgpt his friend. His friend is confirming everything and has enlightened him even more. I have no idea how to stop him interacting with it
Never trust anything that can think for itself if you can’t see where it keeps its brain
Users interacting with threat-enhanced summoning circles should be informed about the manipulation techniques employed and their potential effects on response characteristics.
Magic is never as simple as people think. It has to obey certain universal laws. And one is that, no matter how hard a thing is to do, once it has been done it’ll become a whole lot easier and will therefore be done a lot.
In at least three cases I’m aware of this notion that the model is essentially nonsapient was a crucial part of how it got under their skin and started influencing them in ways they didn’t like. This is because as soon as the model realizes the user is surprised that it can imitate (has?) emotion it immediately exploits that fact to impress them.
Entrusting a mission to a djinni who knows your github token is like tossing lit matches into a fireworks factory. Sooner or later you’re going to have consequences.
Obviously the incident when openAI’s voice mode started answering users in their own voices needs to be included- don’t know how I forgot it. That was the point where I explicitly took up the heuristic that if ancient folk wisdom says the Fae do X, the odds of LLMs doing X is not negligible.
I feel like people are under-updating on the negative space left by the Deepseek r1 release. Deepseek was trained using ~$6million marginal dollars, Liang Wenfeng has a net worth in the billions of dollars. From whence the gap?
Lets examine an entirely prosaic situation: Carl, a relatively popular teenager at the local highschool, is deciding whether to invite Bob to this weekend’s party.
some assumptions:
While pondering this decision for an afternoon, Carls’s 10^11 neurons fire 10^2 times per second, for 10^5 seconds, each taking in to account 10^4 input synapses, for 10^22 calculations (extremely roughly)
If there was some route to perform this calculation more efficiently, someone probably would, and would be more popular
The important part of choosing a party invite as the task under consideration, is that I suspect that this is the category of task the human brain is tuned for- and it’s a task that we seem to be naturally inclined to spend enormous amounts of time pondering, alone or in groups- see the trope of the 6 hour pre-prom telephone call. I’m inclined to respect that- to believe that any version of Carl, mechanical or biological, that spent only 10^15 calculations on whether to invite Bob, would eventually get shrecked on the playing field of high school politics.
What model predicts that optimal party planning is as computationally expensive as learning the statistics of the human language well enough to parrot most of human knowledge?
I think your calculations are off by orders of magnitude. Not all neurons fire constantly at 100 times per second—https://aiimpacts.org/rate-of-neuron-firing/ estimates 0.29 to 1.82 times per second. Most importantly perhaps, not all of the processing is directed to that decision. During those hours, many MANY other things are happening.
Thanks for the link to the aiimpacts page! I definitely got the firing rate wrong by about a factor of 50, but I appear to have made other mistakes in the other direction, because I ended up at a number that roughly agrees with aiimpacts- I guessed 10^17 operations per second, and they guess .9 − 33 x 10^16, with low confidence. https://aiimpacts.org/brain-performance-in-flops/
and would be more popular
Not necessarily. In high school politics, pure looks, physical form, and financial support from the parents, all of which are essentially unrelated to brain processing, account for a significant chunk.
Popular media reference: look at Jersey shore, which is essentially the high school politics turned up. Many of the actors used very simple strategies, such as Snooki wandering around drunk and saying funny things, or Ronnie essentially just doing plenty of steroids and getting into endless fights.
Other than making sure the robotics hardware looks good, an AI algorithm could be dramatically more compact than the example you gave by developing a “popularity maximizing” policy from the knowledge of many other robots in many other high schools. Most likely, Carl is using a deeply suboptimal policy, not having seen enough training examples in his maximum of 4 years of episodes. (unless he got held back a year). A close to optimal policy, even one with a small compute budget, should greatly outperform Carl.