I agree a lot with the attitude of leadership through service/recognising the work of those who support others. Not sure that I would endorse the whole “hero/sidekick” dynamic. In my head it’s more like “a group of adventurers in a fellowship, each of which has a different role.”
testingthewaters
Thanks for writing out something that I feel very strongly but often have trouble articulating in these spaces. The song made me tear up. Incidentally, my main character/self-representation for my aborted Harry Potter long-form fanfiction was a hufflepuff.
Follow up to https://vitalik.eth.limo/general/2025/11/07/galaxybrain.html
Here is a galaxy brain argument I see a lot:
“We should do [X], because people who are [bad quality] are trying to do [X] and if they succeed the consequences will be disastrous.”
Usually [X] is some dual use strategy (acquire wealth and power, lie to their audience, build or use dangerous tech) and [bad quality] is something like being reckless, malicious, psychopathic etc. Sometimes the consequence is zero sum (they get more power to use to do Bad Things relative to us, the Good People) and sometimes the consequence is negative sum (Bad Things will happen)
As someone pointed out on the twitter replies to the mechanize essay, this kind of argument basically justifies selling crack or any other immoral action provided you can imagine a hypothetical worse person doing the same thing. See also its related cousin, “If I don’t do [X] someone else will do it anyways”, which (as Vitalik points out) assumes a perfectly liquid labour market that usually does not exist except in certain industries.
I leave it to the reader to furnish examples.
Yeah the next level of the question is something like “we can prove something to a small circle of experts, now how do we communicate the reasoning and the implications to policymakers/interested parties/the public in general”
To be honest, this makes me quite worried. Suppose that someone working with mathematical methods proves something of dire importance to society (lets say he comes up with a definitive formula for measuring probability of disaster in a given year, or the minimum conditions for AI takeoff). How will this be communicated to other mathematicians, much less the public?
Great review and post, leaves me with a lot more hope for positive, non-coercive, and non-guilting/brow-beating change in beliefs. I read the book before reading your review and agree with your summary, and I would go so far as thanking you for raising/summarising points made in the book that I didn’t get during my own read-through. At this point I have a pretty firm belief that (as they say in Inception) positive motivation is stronger than negative motivation, at least for the purposes of long-term, intentional activities like cultivating an open attitude to facts and reason in the self.
See also this paper about plasticity as dual to empowerment https://arxiv.org/pdf/2505.10361v2
Um, I really like a lot of your writing. But I think the parts of your post that are in bold paint a very different picture to the parts that aren’t in bold.
That would be a pleasant fantasy for people who cannot abide the notion that history depends on small little changes or that people can really be different from other people.
I think both of those are true, but it does not follow that history is made of individuals solving individual math problems and pushing out papers which get stacked into the intellectual tower of babel. History as far as I can see is made out of systems or ensembles of people moving around in different configurations.
Yudkowsky couldn’t do what he did without ET Jaynes, who in turn relied on the progenitors of probability and rationality including Thomas Bayes and William of Ockham. But he was also influenced “sideways” by the people who he learned from and defined himself against, the people in SL4 and the people he called idiots and the venture capitalists he once idolised for their competence and Peter Thiel and Demis Hassabis and his family. They shape (at the very least) his emotional worldview, which then shapes how he takes in information and integrates it at a deep and fundamental level. This is true insofar as it is true for any human who lives in a society. When I write anything I can feel the hands of writers past and present shaping my action space. They shape both what I write about and how I choose to write.
So yes if he was gone everything would be different. But it would also be the same, people would love and fight and struggle and cooperate. The sameness of trends manifests at a higher level of coarsegraining, the level where the systemic forces and the long dreams and molochian demons live. And none of this diminishes what he did, does, will do, or could have done. It’s just the way things are, because we can’t run randomised control trials on society.
No worries at all, I know I’ve had my fair share of bitter moments around AI as well. I hope you have a nice rest of your day :)
For what it’s worth I’m not white and I come primarily from an AI ethics background, my formal training is in the humanities. I do think its sad that people only fret about bias the moment it affects them, however, and I would rather the issue be taken seriously from the start.
There appears to be a distaste/disregard for AI ethics (mostly here referring to bias and discrimination) research in LW. Generally the idea is that such research misses the point, or is not focused on the correct kind of misalignment (i.e. the existential kind). I think AI ethics research is important (beyond its real world implications) just like RL reward hacking in video game settings. In both cases we are showing that models learn unintended priorities, behaviours, and tendencies from the training process. Actually understanding how these tendencies form during training will be important for improving our understanding of SL and RL more generally.
In video games this is made literal by every entity having a central coordinate. Their body is merely a shell wrapped around the point-self and a channel for the will of the external power (the player).
A postmortem for the Economic Safety movement (fiction):
After eminent economist Mr. Senyek warned in 1991 that a hypothetical future “economic tsunami” could cause systemic risks to the American-led global financial order as a whole, researchers and think tanks quickly rallied to the cause of Economic Safety. They reasoned that in order to anticipate the risks of this hypothetical “Economic Tsunami”, they needed access to the frontier of financial trading. Within several years Economic Safety advocates joined eminent firms like JP Morgan and Bear Stearns, with Mr. Senyek providing introductions to particularly promising young economists.
Disillusioned with the domination of the financial system by large corporations with no sense of social obligation, a group of billionaire investors and traders started OpenFinance, with the goal of creating a hedge fund that would benefit the public instead of a small circle of millionaires and the ultrawealthy. Convinced of the need to acquire a trading edge against the big firms, OpenFinance pioneered the use of CDCs and MBS products to achieve unheard of levels of leverage and record profits. Despite stating that they would use their earnings to benefit society, they decided that the dream of systemic overhaul would only be achieved by becoming the dominant financial player, incorporating a for profit arm to that end and raising major sums from Morgan Stanley to fund their General Partnership Trading (GPT) system. The system increased access to financial products and services by allowing the general public to invest in CDCs and MBSes, democratising the returns of financial trading, but was criticised for creating the systemic risk they seeked to avoid.
In 2004, a team of traders at OpenFinance (often shortened as OpenFi) accused OpenFi leadership of being reckless and insufficiently concerned about Economic Safety. They decided to start a new hedge fund known as Anthropocentric Trading which would offer services better aligned to their principles. A fundraising war for talent and capital to form new investment funds ensued, with both firms acquiring investors from the Middle East and courting governments as part of bids to reshape the global economic order. Anthropocentric Trading admitted to leveraging heavily based on the same products and tactics as OpenFi, reasoning that it needed to stay competitive in a multipolar economic race. It is now 2007...
Furthermore, going hard also imposes opportunity costs and literal costs on future you even if you have all your priorities perfectly lined up and know exactly what should be worked on at any time. If you destabilise yourself enough trying to “go for the goal” your net impact might ultimately be negative (not naming any names here...).
Some books you might like to read:
Seeing Like a State by James C Scott (I’ve read most of it, I liked it)
Bullshit Jobs, The Dawn of Everything, most books by David Graeber (I’ve read and liked long extracts of his work)
The End: Hitler’s Germany 1944–45 by Sir Ian Kershaw (I’ve read all of it and found it very valuable as a complete picture of a society melting down)
Open Letters by Vaclav Havel (I’ve read a lot of it, I like it a lot. He was the first president of Czechoslovakia and a famous communist dissident and his writing sketches out both what he finds soul-destroying about that system and what he thinks are the principles of good societies)
System Effects: Complexity in Political and Social Life by Robert Jervis (I’m reading this now, very good case studies about non-obvious phenomena in international relations)
Broken Code: Inside Facebook and the fight to expose its toxic secrets by Jeff Horwitz (Very good book about how social media platforms like Facebook shape and are shaped by modern civilisation, I read all of it)
All of these books to various degrees tackle the things you are describing from a holistic perspective. Hope this helps.
It loads past conversations (or parts of them) into context, so it could change behaviour.
A lesson from the book System Effects: Complexity in Political and Social Life by Robert Jervis, and also from the book The Trading Game: A Confession by Gary Stevenson.
When people talk about planning for the future, there is often a thought chain like this:
All other things being equal, a world with thing/organisation/project X is preferable compared to a world without thing/organisation/project X
Therefore, I should try to make X happen
I will form a theory of change and start to work at making X happen
But of course the moment you start working at making X happen you have already destroyed the premise. There are no longer two equal worlds held in expectation, one with X and one with no X. There is now the world without X (in the past), and the world where you are trying to make X happen (the present). And very often the path to attaining X creates a world much less preferable for you than the world before you started, long before you reach X itself.
For example:
I can see a lucrative trade opportunity where by the end of five months, the price for some commodity will settle at a new, higher point which I can forecast clearly. All other things being equal, if I take this trade I will make a lot of money.
Therefore, I should try and make this trade.
I will take out a large position, and double down if in the interim the price moves in the “wrong” direction.
However, the price can be much more volatile than you expect, especially if you are taking out big positions in a relatively iliquid market. Thus you may find that three months in your paper losses are so large that you reach your pain threshold and back out of the trade for fear that your original prediction was wrong. At the end of the five months, you may have predicted the price correctly, but all you did was lose a large sum of money in the interim.
For another example:
All other things being equal, a world with an awareness of potential race dynamics around AGI is preferable compared to a world without such an awareness.
Therefore, I should try to raise awareness of race dynamics.
I will write a piece about race dynamics and make my arguments very persuasive, to increase the world’s awareness of this issue.
Of course, in the process of trying to raise awareness of this issue, you might first create a world where a small subset of the population (mostly policy and AI people) are suddenly very clued-in to the possibility of the race dynamics. There people are also in a very good position to create, maintain, and capitalize on those dynamics (whether consciously or not), including using them to raise large amounts of cash. Now suddenly the risk of race dynamics is much larger than before, and the world is in a more precarious state.
There isn’t really a foolproof way to get around this problem. However, one tactic might be to look at your theory of change, and instead of comparing the world state before and after the plan, look at the world state along each step of the path to change, and consciously weigh up the changes and tradeoffs at each step. If one of those steps looks like it would break a moral, social, or pain-related threshold, maybe reconsider that theory of change.
Addendum: I think this is also why systems/ecosystems/plans which rely on establishing positive or negative feedback loops are so powerful. They’ve set things up so that each stage incrementally moves towards the goal, so that even if there are setbacks you have room to fall back instead of breaching a pain threshold.
[I think this comment is too aggressive and I don’t really want to shoulder an argument right now]
With apologies to @Garrett Baker .
It is possible to not be the story’s subject and still be the protagonist of one strand for it. After all, that’s the only truth most people know for ~certain. It’s also possible to not dramatize yourself as the Epicentre of the Immanent World-Tragedy (Woe is me! Woe is me!) and still feel like crap in a way that needs some form of processing/growth to learn to live with. Similarly, you can be well-balanced and feel some form of hope without then making yourself the Epicentre of the Redemption of the World.
I guess what I’m trying to say is that you can feel things very strongly even without distorting your world-model to make it all about your feelings (most of the time, at least).