I dropped out of a MSc. in mathematics at a top university, in order to focus my time on AI safety.
Knight Lee
I completely agree with this! I think lots of people here are so focused on slowing down AI that they forget the scope of things. According to Remmelt himself, $600 billion+ is being invested yearly into AI. Yet AI safety spending is less than $0.2 billion.
Even if money spent on AI capabilities speeds up capabilities far more efficiently than money spent on AI alignment speeds up alignment, it’s far easier to grow AI alignment effort by twofold, and far harder to make even a dent in the AI capabilities effort! I think any AI researcher who works on AI alignment at all, should sleep peacefully at night knowing they are a net positive (barring unpredictably bad luck). We shouldn’t alienate these good people.
Yet I never manage to convince anyone on LessWrong of this!
PS: I admit there are some reasonable world models which disagree with me.
Some people argue that it’s not a race between AI capabilities and AI alignment, but a race between AI capabilities and some mysterious time in the future when we manage to ban all AI development. They think this, because they think AI alignment is very impractical.
I think their world model is somewhat plausible-ish.
But first of all, if this was the case AI alignment work still might be an indirect net positive by moving the Overton window for taking AI x-risk seriously rather than laughing at it as a morbid curiosity. It’s hard to make a dent in the hundreds of billions spent on AI capabilities, so the main effect of hundreds of millions spent on AI alignment research will still be normalizing a serious effort against AI x-risk. The US spending a lot on AI alignment is a costly signal to China, that AI x-risk is serious, and US negotiators aren’t just using AI x-risk as an excuse to convince China to give up the AI race.
Second of all, if their world model was really correct, the Earth is probably already doomed. I don’t see a realistic way to ban all AI development in every country in the near future. Even small AI labs like DeepSeek are making formidable AI, so there has to be an absurdly airtight global cooperation. We couldn’t even stop North Korea from getting nukes, which was relatively far easier. In this case, the vast majority of all value in the universe would be found in ocean planets with a single island nation, where there would be no AI race between multiple countries (thus far far easier to ban AI). Planets like Earth (with many countries) would have a very low rate of survival, and be a tiny fraction of value in the universe.
My decision theory, is to care more about what to do in scenarios where what I do actually matter, and therefore I don’t worry too much about this doomed scenario.
PS: I’m not 100% convinced Anthropic in particular is a net positive.
Their website only mentions their effort against AI x-risk among a pile of other self promoting corporate-speak, and while they are making many genuine efforts it’s not obviously superior to other labs like Google DeepMind.
I find it confusing how many AI labs which seem to care enough about AI x-risk enough to be a net positive, are racing against each other rather than making some cooperative deal (e.g. Anthropic, Google DeepMind, SSI, and probably others I haven’t heard about yet).
I think FDT/UDT only allows you to influence the decisions of other people who also believe in FDT/UDT.[1]
No matter how strongly you cooperate, if the reason you decide to cooperate is because of FDT/UDT, then that means you still would have defected if you didn’t believe FDT/UDT, and therefore other people (whose decisions correlate with you) will still defect just like before, regardless of how strongly FDT/UDT makes you cooperate.
- ^
Assuming there are no complicated simulations or acausal trade commitments.
- ^
Within our property rights, animals are seen more as properties rather than property owners. We may keep them alive out of self interest, but we only treat them well out of altruism. The rule of law is a mix of
laws protecting animals and plants as properties, which is a rather small set of economically valuable species which aren’t treated very well
and
laws protecting animals and plants out of altruism, whether it’s animal rights or deontological environmentalism
I agree you can have degrees of cooperation between 0% and 100%. I just want to say that even powerful species with 0% cooperation among themselves can make others go extinct.
I don’t have proof that a system which cooperates internally like a single agent (i.e. Borg-like) is the most competitive. However it’s only one example of how a powerful selfish agent or system could grow and kill everyone else.
Even if it does turn out that the most competitive system lacks internal cooperation, and allows for cooperation between internal agents and external agents (and that’s a big if). There is still no guarantee that external agents will survive. Humans lack cooperation with one another, and can cooperate with other animals and plants when in conflict with other humans. But we still caused a lot of extinctions and abuses to other species. It is only thanks to our altruism (not our self interest) that many other creatures are still alive.
Even though symbiosis and cooperation exists in nature, the general rule still is that whenever more competitive species evolved, which lacked any altruism for other species, less competitive species died out.
If all the resources in the world go towards feeding clones of one person, who is more ruthless and competent than you, there will be no resources left to feed you, and you’ll die.
If the clones of that person fail to cooperate among themselves, that person (and his clones) will be out-competed by someone else whose clones do cooperate among themselves (maybe using ruthless enforcement systems like the ancient Spartan constitution).
Technically, I think you’re correct to say “We are ruled by markets, bureaucracies, social networks and religions. Not by gods or kings.” But I’m obviously talking about a very different kind of system which is more Borg-like and less market-like.
Throughout all of existence, the world was riddled with the corpses of species which tried their level best to exist, but nonetheless were wiped out. There is no guarantee that you and I will be an exception to the rule.
I believe that this argument is wrong because it misunderstands how the world actually works in quite a deep way. In the modern world and over at least the past several thousand years, outcomes are the result of systems of agents interacting, not of the whims of a particularly powerful agent.
We are ruled by markets, bureaucracies, social networks and religions. Not by gods or kings.
I think that’s because powerful humans aren’t able use their resources to create a zillion clones of themselves which live forever.
I agree that the anthropic filter may be sufficient to explain the unreasonable effective of mathematics, but I don’t think it’s a necessary explanation. I doubt that universes like ours are vastly outnumbered by alternative universes where:
Math isn’t unreasonably effective
Life still evolves
Yet intelligence fails to evolve because there are no patterns to predict
The anthropic filter is necessary for explaining why Earth has water (when most planets don’t), and may be necessary for explaining why the universe seems fine tuned for life. But probably isn’t necessary for explaining the unreasonably effectiveness of mathematics.
Wigner argues this is “unreasonable” because there is no logical reason why the universe should obey laws that conform to man-made mathematical structures.
I think Eugene Wigner is misunderstanding something. The causality isn’t:
“The math which humans make/discover”—determines--> “The structure of mathematics”—determines--> “How the world works”
Instead, the causality is:
“The math which humans make/discover” <--determines—“The structure of mathematics”—determines--> “How the world works”
It’s true that “there is no logical reason why the universe should obey laws that conform to man-made mathematical structures,” but there’s a very logical reason why the universe should obey laws that man-made mathematical structures also obey.
For example, math and logic can simply be defined as the rules which many different phenomena (regardless of origin) follow. The difference between math and logic is that math is very complex logic (e.g. a large number can be represented as a binary string of TRUE and FALSE, math operations are made out of AND OR and NOT logical statements, and so math statements are essentially just complex logical statements).
As for why the same logic applies to different things, the answer is probably as elusive as “why does the universe exist? Why does math and logic exist?” Not every explanation has an explanation.
Isn’t the most upvoted curated post right now about winning? A case for courage, when speaking of AI danger is talking about strategy, not technical research.
If you’re looking for people interested in personal strategies for individuals (e.g. earning to give), I think most of them are on the Effective Altruism Forum rather than LessWrong. The network effect means that everyone interested in a topic tend to cluster in one forum, even if they are given two choices initially.
Another speculative explanation, is that
maybe the upvote system allows the group of people interested in one particular topic (e.g. technical research, e.g. conceptual theorization) to upvote every post on that topic without running out of upvotes. This rewards people to repeatedly write posts on the most popular topics since it’s much easier to have net positive upvotes that way.
PS: I agree that earning to give is reasonable
I’m considering this myself right now :)
I mostly agree with you that hiring experts and having a great impact is feasible. Many of the technical alignment researchers who lament “money isn’t what we need, what we need is to be on the right direction instead of having so much fake research!” fail to realize that their own salaries are also coming from the flawed but nonetheless vital funding sources. If it wasn’t for the flawed funding sources, they would have nothing at all.
Some of them might be wealthy enough to fund themselves, but that’s effectively still making money to hire experts (the expert is themselves).
And yes, some people use AI safety careers as a stepping stone to AI capabilities careers. But realistically, the whole world spends less than $0.2 billion on AI safety and hundreds of billions on AI capabilities. AI safety salaries are negligible here. One might argue that the non-monetary moral motivation in working on AI safety, has caused people to end up working on AI capabilities. But in this case increasing AI safety salaries should reduce this throughput rather than increase it.
But Raemon is so right about the great danger of being a net negative. Don’t follow an “ends justify the means” strategy like Sam Bankman Fried, beware of your ego convincing you that AI is safer so long as you’re they guy in charge (like Sam Altman or Elon Musk). These biases are insidious, because we are machines programmed by evolution, not to seek truth for the sake of truth, but to
Arrive at the truth when it increases inclusive fitness
Arrive at beliefs which get us to do evil while honestly believing we are doing good (when it increases inclusive fitness)
Arrive at the said beliefs, despite wholly believing we are seeking the truth
Haha you’re right, in another comment I was saying
55% of Americans surveyed agree that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Only 12% disagree.
To be honest, I’m extremely confused. Somehow, AI Notkilleveryoneism… is both a tiny minority and a majority at the same time.
I think the real problem here is raising public awareness about how many people are already on team ‘AI Notkilleveryoneism’ rather than team ‘AI accelerationist’. This is a ‘common knowledge’ problem from game theory—the majority needs to know that they’re in the majority,
That makes sense, it seems to explain things. The median AI expert also has a 5% to 10% chance of extinction, which is huge.
I’m still not in favour of stigmatizing AI developers, especially right now. Whether AI Notkilleveryoneism is a real minority or an imagined minority, if it gets into a moral-duel with AI developers, it will lose status, and it will be harder for it to grow (by convincing people to agree with it, or by convincing people who privately agree to come out of the closet).
People tend to follow “the experts” instead of their very uncertain intuitions about whether something is dangerous. With global warming, the experts were climatologists. With cigarette toxicity, the experts were doctors. But with AI risk, you were saying that,
Thousands of people signed the 2023 CAIS statement on AI risk, including almost every leading AI scientist, AI company CEO, AI researcher, AI safety expert, etc.
It sounds like, the expertise people look to when deciding “whether AI risk is serious or sci-fi,” comes from leading AI scientists, and even AI company CEOs. Very unfortunately, we may depend on our good relations with them… :(
That’s a very good point, and I didn’t really analyze the comparison.
I guess maybe meat eating isn’t the best comparison.
The closest comparison might be researchers developing some other technology, which maybe 2⁄3 people see as a net negative. E.g. nuclear weapons, autonomous weapons, methods for extracting fossil fuel, tobacco, etc.
But no campaign even really tried to stigmatize these researchers. Every single campaign against these technologies have targeted the companies, CEOs, or politicians leading them, without really any attack towards the researchers. Attacking them is sort of untested.
I completely agree this discussion should be moved outside your post. But the counterintuitive mechanics of LessWrong mean a derailing discussion may actually increase the visibility and upvotes of your original message (by bumping it in the “recent discussion”).
(It’s probably still bad if it’s high up in the comment section.)
It’s too bad you can only delete comment threads, you can’t move them to the bottom or make them collapsed by default.
That’s a very good point, and these examples really changes my intuition from “I can’t see this being a good idea,” to “this might make sense, this might not, it’s complicated.” And my earlier disagreement mostly came from my intuition.
I still have disagreements, but just to clarify I now agree your idea deserves more attention that it’s getting.
My remaining disagreement is I think stigmatization only reaches the extreme level of “these people are literally evil and vile,” after the majority of people already agree.
In places in India where the majority of people are already vegetarians, and already feel that eating meat is wrong, the social punishment of meat eaters does seem to deter them.
But in places where most people don’t think eating meat is wrong, prematurely calling meat eaters evil may backfire. This is because you’ve created a “moral-duel” where you force outside observers to either think the meat-eater is the bad guy, or you’re the bad guy (or stupid guy). This “moral-duel” drains the moral standing of both sides.
If you’re near the endgame, and 90% of people already are vegetarians, then this moral-duel will first deplete the meat-eater’s moral standing, and may solidify vegetarianism.
But if you’re at the beginning, when only 1% of people support your movement. You desperately want to invest your support and credibility into further growing your support and credibility, rather than burning it in a moral-duel against the meat-eater majority the way militant vegans did.
Nurturing credibility is especially important for AI Notkilleveryoneism, where the main obstacle is a lack of credibility and “this sounds like science fiction.”
Finally, at least only go after the AI lab CEOs, as they have relatively less moral standing, compared to the rank and file researchers.
E.g. in this quicktake Mikhail Samin appealed to researchers as friends asking them to stop “deferring” to their CEO.
Even for nuclear weapons, biological weapons, chemical weapons, landmines, it was hard to punish scientists researching it. Even for the death penalty, it was hard to punish the firing squad soldiers. It’s easier to stick it to the leaders. In an influential book by early feminist Lady Constance Lytton, she repeatedly described the policemen (who fought the movement) and even prison guards as very good people and focused the blame on the leaders.
PS: I read your post, it was a fascinating read. I agree with the direction of it and I agree the factors you mention are significant, but it might not go quite as far as you describe?
One silly sci-fi idea is this. You might have a few “trigger pills” which are smaller than a blood cell, and travel through the bloodstream. You can observe them travel through the body using medical imaging techniques (e.g. PET), and they are designed to be very observable.
You wait until one of them is at the right location, and send very precise x-rays at it from all directions. The x-ray intensity is . A mechanism in the trigger pill responds to this ionizing (or heating?), and it anchors to the location using a chemical glue or physical mechanisms (hooks, string, etc.).
Once the trigger pill is anchored in place, another drug can be taken which only activates when it contacts the trigger pill. (Which might activate yet another drug, if you really want to amplify the effect of this tiny trigger pill.)
This results is a ton of drug activity in that area, without needing invasive surgery.
If you want it to become a bigger and more permanent implant, you might make it grow over time (by adding another chemical), deliberately forming a blood clot. Medical imaging may make sure the trigger pill is in a small expendable blood vessel (you detect the pill moving slower with more twists and turns). It might be designed so that yet another chemical can cover it up or destroy it, in case you need to start over at a new location.
It might be radioactive if it’s trying to treat cancer.
It might be magnetically activated if you want real-time control of drug intensity.
Speaking of magnetically activating it, maybe even the anchoring is triggered by a magnetic field rather than x-rays. It won’t be aimed as precisely, so you can only have one trigger pill at a time, and may have to wait really long before it travels to the right area (the human body is pretty big compared to any small target).
I guess they succeeded in changing many people’s opinions. The right wing reaction is against left wing people’s opinions. The DEI curriculum is somewhere in between opinions and policies.
I think the main effect of people having farther left opinions, is still making policies further left rather than further right due to counter-reaction. And this is despite the topic being much more moralistic and polarizing than AI x-risk.
Trump 2.0 being more pro-Israel could be due to him being more extreme in all directions (perhaps due to new staff members, vice president, I don’t know), rather than due to pro-Palestinian protests.
The counter-reaction are against the protesters, not the cause itself. The Vietnam War protests also created a counter-reaction against the protesters, despite successfully ending the war.
I suspect for a lot of these pressure campaigns which work, the target has a tendency to pretend he isn’t backing down due to the campaign (but other reasons), or act like he’s not budging at all until finally giving in. The target doesn’t want people to think that pressure campaigns work on him, the target wants people to think that any pressure him will only get a counter-reaction out of him, in order to discourage others from pressuring him.
You’re probably right about the courts though, I didn’t know that.
I agree that there is more anti-abortion efforts due to Roe v. Wade, but I disagree that these efforts actually overshot to a point where restrictions on abortion are even harsher than they would be if Roe v. Wade never happened. I still think it moved the Overton window such that even conservatives feel abortion is kind of normal, maybe bad, but not literally like killing a baby.
The people angry against affirmative action have a strong feeling that different races should get the same treatment e.g. when applying to university. I don’t think any of them overshot into wanting to bring back segregation or slavery.
Oops, “efforts which empirically appear to work” was referring to how the book, If Anyone Builds, It Everyone Dies attracted many big name endorsements who aren’t known for endorsing AI x-risk concerns until now.
I’m personally against this as matter of principle, and I also don’t think it’ll work.
Moral stigmatizing only works against a captive audience. It doesn’t work against people who can very easily ignore you.
You’re more likely to stop eating meat if a kind understanding vegetarian/vegan talks to you and makes you connect with her story of how she stopped eating meat. You’re more likely to simply ignore a militant one who calls you a murderer.
Moral stigmatizing failed to stop nuclear weapon developers, even though many of them were the same kind of “nerd” as AI researchers.
People see Robert Oppenheimer saying “Now, I am become Death, the destroyer of worlds” as some morally deep stuff. “The scientific community ostracized [Edward] Teller,” not because he was very eager to build bigger bombs (like the hydrogen bomb and his proposed Sundial), but because he made Oppenheimer lose his security clearance by saying bad stuff about him.
Which game do you choose to play? The game of dispassionate discussion, where the truth is on your side? Or the game of Twitter-like motivated reasoning, where your side looks much more low status than the AI lab people, and the status quo is certainly not on your side?
Imagine how badly we’ll lose the argument if people on our side are calling them evil and murderous and they’re talking like a sensible average Joe trying to have a conversation with us.
Moral stigmatization seems to backfire rather than help for militant vegans because signalling hostility is a bad strategy when you’re the underdog going against the mainstream. It’s extremely big ask for ordinary people show hostility towards other ordinary people who no one else is hostile towards. It’s even difficult for ordinary people to be associated with a movement which shows such hostility. Most people just want to move on with their lives.
I think you’re underestimating the power of backlashes to aggressive activism. And I say this, despite the fact just a few minutes ago I was arguing to others that they’re overestimating the power of backlashes.
The most promising path to slowing down AI is government regulation, not individuals ceasing to do AI research.
- Think about animal cruelty. Government regulation has succeeded on this many times. Trying to shame people who work in factory farms into stopping, has never worked, and wise activists don’t even consider doing this.
- Think about paying workers more. Raising the minimum wage works. Shaming companies into feeling guilty doesn’t. Even going on strike doesn’t work as well as minimum wage laws.
- Despite the fact half of the employees refusing to work is like 10 times more powerful than non-employees holding a sign saying “you’re evil.”
- Especially a tiny minority of society holding those signs- Though then again, moral condemnation is a source of government regulation.
Disclaimer: not an expert just a guy on the internet
Strong disagree, but strong upvote because it’s “big if true.” Thank you for proposing a big crazy idea that you believe will work. I’ve done that a number of times, and I’ve been downvoted into the ground without explanation, instead of given any encouraging “here’s why I don’t think this will work, but thank you.”
I don’t believe that in a world without pro-Palestinian protests, Trump would be noticeably less pro-Israel.
I think in such a world, even the Democrats would be more comfortable supporting Israel without reservations and caveats.
I think the protests and pressure against the Vietnam war, forced even Republican administrations to give in and end the war. This is despite crackdowns on protests similar to those against pro-Palestinian protests.
I think some of the Supreme Court justices appointed under Trump aren’t that extreme and refused to given in to his pressure.
But even if it’s true that the Trump administration is making these structural changes, it still doesn’t feel intuitive to me that e.g., a stronger anti-abortion policy under Democrats, would cause Trump to get elected, which would cause structural changes, which would cause a weaker anti-abortion policy in the future. The influence is diluted through each of these causes, such that the resulting effect is probably pretty small compared to the straightforward effect “a stronger anti-abortion policy today makes the default anti-abortion policy for the future stronger.”
The world is complex, but unless there is some unusual reason to expect an effort to backfire and have literally the opposite effect in the long run, it’s rational to expect efforts which empirically appear to work, to work. It feels mysterious to expect many things to be “net negatives” based on an inside view.
I agree
I agree certain kinds of actions can fail to obtain desired results, and still have backlash.
If you have “activism” which is violent or physically threatening enough (maybe extremists in pro-Palestinian protests), it does create backlash to the point of being a significant net negative.
Even more consequential, are the violent actions by Hamas in reaction to Israeli mistreatment of Palestinians. This actually does cause even more mistreatment, so much so that most of the mistreatment may be caused by it.
But this is violence we are talking about, not activism. The nonviolent protesters are still a net positive towards their cause.
Edit: I do think this proposal of vilifying AI labs could potentially be a net negative.
I agree, but I don’t think individual woke activists writing books and sending it to policymakers, can directly increase the perception of “there is too much wokeness,” even if no policymakers listen to them.
They only increase the perception of “there is too much wokeness,” by way of successfully changing opinions and policies.
The perception that “there is too much wokeness” depends on
Actual woke opinions and policies by the government and people
Anti-woke activism which convince conservatives that “the government and leftwingers” are far more woke than they actually are
Not pro-woke activism (in the absence of actual woke opinions and policies)
So the only way activists can be a net negative, is if making policymakers more woke (e.g. more pro-abortion), can causally make future policymakers even less woke than they would be otherwise.
This is possible if it makes people feel “there is too much wokeness” and elect Trump. But for a single subtopic of wokeness e.g. pro-abortion, it’s unlikely to singlehandedly determine whether Trump is elected, and therefore making policymakers more pro-abortion in particular, probably has a positive influence on whether future policymakers are pro-abortion (by moving the Overton window on this specific topic).
This is probably even more true for strategic/scientific disagreements rather than moral disagreements: if clinical trial regulations were stricter during a Democrat administration, they probably will remain stricter during the next Republican administration. It’s very hard to believe that the rational prediction could be “making the regulations stronger will cause the expected future regulations to be weaker.”
You don’t hear about the zillions of policies which Trump did not reverse (or turn upside down). You don’t hear about the zillions of scientific positions held by Democrat decisionmakers which Trump did not question (or invert).
I actually didn’t see that glaring example! Very good point.
That said, my feeling is Trump et al. weren’t reacting against any specific woke activism, but very woke policies (and opinions) which resulted from the activism.
Although they reversed very many Democrat policies, I don’t think they reversed them so badly that a stronger Democrat policy will result in a stronger policy in the opposite direction under the Trump administration. The Overton window effect may still be stronger than the reverse-psychology effect.
In a counterfactual world where one of these woke policies/opinions was weaker among Democrats (e.g. the right to abortion), that specific opinion would probably do even worse under Trump (abortion might be banned). Trump’s policies are still positively correlated with public opinion. He mostly held back from banning abortion and cutting medical benefits because he knew these liberal policies were popular. But he aggressively attacked immigration (and foreign aid) because these liberal policies were less popular. Despite appearances, he’s not actually maximizing .
The one counter-reaction, is that in aggregate, all the woke policies and opinions may have made Trump popular enough to get elected? But I doubt that pausing AI etc. will be so politically significant it’ll determine who wins the election.
PS: I changed my mind on net negatives. Net negative activism may be possible when it makes the cause (e.g. AI Notkilleveryoneism) becomes partisan and snaps into one side of the political aisle? But even Elon Musk supporting it hasn’t caused that to happen?
Yeah, I think arguably the biggest thing to judge AI labs on is whether they are pushing the government in favour of regulation or against. With businesses in general, the only way for businesses in a misregulated industry to do good, is to lobby in favour of better regulation (rather than against).
It’s inefficient and outright futile for activists to demand individual businesses to unilaterally do the right thing, get outcompeted, go out of business, have to fire all their employees, and so much better if the activists focus on the government instead. Not only is it extraordinarily hard for one business to make this self sacrifice, but even if one does it, the problem will remain almost just as bad. This applies to every misregulated industry, but for AI in particular “doing the right thing” seems the most antithetical to commercial viability.
It’s disappointing that I don’t see Anthropic pushing the government extremely urgently on AI x-risk, whether it’s regulation or even x-risk spending. I think at one point they even mentioned the importance of the US winning the AI race against China. But at least they’re not against more regulation and seem more in favour of it than other AI labs? At least they’re not openly downplaying the risk? It’s hard to say.