307th

Karma: 408

307th 6 Sep 2025 10:43 UTC
4 points
1
in reply to: Knight Lee’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
Wow we have a lot of the same thinking!
I’ve also felt like people who think we’re doomed are basically spending a lot of their effort on sabotaging one of our best bets in the case that we are not doomed, with no clear path to victory in the case where they are correct (how would Anthropic slowing down lead to a global stop?)
And yeah I’m also concerned about competition between DeepMind/Anthropic/SSI/OpenAI—in theory they should all be aligned with each other but as far as I can see they aren’t acting like it.
As an aside, I think the extreme pro-slowdown view is something of a vocal minority. I met some Pause AI organizers IRL and brought up the points I brought in my original comment, expecting pushback, but they agreed, saying they were focused on neutrally enforced slowdowns e.g. government action.

307th 4 Sep 2025 19:52 UTC
3 points
0
in reply to: the gears to ascension’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
My point was that even though we already have an extremely reliable recipe for getting an LLM to understand grammar and syntax, we are not anywhere near a theoretical guarantee for that. The ask for a theoretical guarantee seems impossible to me, even on much easier things that we already know modern AI can do.
When someone asks for an alignment guarantee I’d like them to demonstrate what they mean by showing a guarantee for some simpler thing—like a syntax guarantee for LLMs. I’m not familiar with SLT but I’ll believe it when I see it.

307th 4 Sep 2025 19:44 UTC
1 point
0
in reply to: Remmelt’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
Just a note here that I’m appreciating our conversation :) We clearly have very different views right now on what is strategically needed but digging your considered and considerate responses.
Thank you! Same here :)
How do you account for the problem here that Nvidia’s and downstream suppliers’ investment in GPU hardware innovation and production capacity also went up as a result of the post-ChatGPT race (to the bottom) between tech companies on developing and releasing their LLM versions?

I frankly don’t know how to model this somewhat soundly. It’s damn complex.
I think it’s definitely true that AI-specific compute is further along than it would be if there hadn’t been the LLM boom happening. I think the relationship is unaffected though—earlier LLM development means faster timelines but slower takeoff.
Personally I think slower takeoff is more important than slower timelines, because that means we get more time to work with and understand these proto-AGI systems. On the other hand to people who see alignment as more of a theoretical problem that is unrelated to any specific AI system, slower timelines are good because they give theory people more time to work and takeoff speeds are relatively unimportant.
But I do think the latter view is very misguided. I can imagine a setup for training a LLM in a way that makes it both generally intelligent and aligned; I can’t imagine a recipe for alignment that works outside of any particular AI paradigm, or that invents its own paradigm while simultaneously aligning it. I think the reason a lot of theory-pilled people such as people at MIRI become doomers is that they try to make that general recipe and predictably fail.
This not a very particular view – in terms of the possible lines of reasoning and/or people with epistemically diverse worldviews that end up arriving at this conclusion. I’d be happy to discuss the reasoning I’m working from, in the time that you have.
I think I’d like to have a discussion about whether practical alignment can work at some point, but I think it’s a bit outside the scope of the current convo. (I’m referring to the two groups here as ‘practical’ and ‘theoretical’ as a rough way to divide things up).
Above and beyond the argument over whether practical or theoretical alignment can work I think there should be some norm where both sides give the other some credit. Because in practice I doubt we’ll convince each other, but we should still be able to co-operate to some degree.
E.g. for myself I think theoretical approaches that are unrelated to the current AI paradigm are totally doomed, but I support theoretical approaches getting funding because who knows, maybe they’re right and I’m wrong.
And on the other side, given that having people at frontier AI labs who care about AI risk is absolutely vital for practical alignment, I take anti-frontier lab rhetoric as breaking a truce between the two groups in a way that makes AI risk worse. Even if this approach seems doomed to you, I think if you put some probability on you being wrong about it being doomed then the cost-benefit analysis should still come up robustly positive for AI-risk-aware people working at frontier labs (including on capabilities).
This is a bit outside the scope of your essay since you focused on leaders at Anthropic who it’s definitely fair to say have advanced timelines by some significant amount. But for the marginal worker at a frontier lab who might be discouraged from joining due to X-risk concerns, I think the impact on timelines is very small and the possible impact on AI risk is relatively much larger.

307th 3 Sep 2025 15:06 UTC
3 points
0
in reply to: Remmelt’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
They made a point at the time of expressing concern about AI risk. But what was the difference they made here?
I think you’re right that releasing GPT-3 clearly accelerated timelines with no direct safety benefit, although I think there are indirect safety benefits of AI-risk-aware companies leading the frontier.
You could credibly accuse me of shifting the goalposts here, but in GPT-3 and GPT-4′s case I think the sooner they came out the better. Part of the reason the counterfactual world where OpenAI/Anthropic/DeepMind had never been founded and LLMs had never been scaled up seems so bad to me is that not only do none of the leading AI companies care about AI risk, but also once LLMs do get scaled up, everything will happen much faster because Moore’s law will be further along.
It does not hinge though on just that view. There are people with very different worldviews (e.g. Yudkowsky, me, Gebru) who strongly disagree on fundamental points – yet still concluded that trying to catch up on ‘safety’ with current AI companies competing to release increasingly unscoped and complex models used to increasingly automate tasks is not tractable in practice.
Gebru thinks there is no existential risk from AI so I don’t really think she counts here. I think your response somewhat confirms my point—maybe people vary on how optimistic they are about alternative theoretical approaches, but the common thread is strong pessimism about the pragmatic alignment work frontier labs are best positioned to do.

I’m noticing that you are starting from the assumption that it is a tractibly solvable problem – particularly by “people who work closely with cutting edge AI and who are using the modern deep learning paradigm”.
A question worth looking into: how can we know whether the long-term problem is actually solvable? Is there a sound basis for believing that there is any algorithm we could build in that would actually keep controlling a continuously learning and self-manufacturing ‘AGI’ to not cause the extinction of humans (over at least hundreds of years, above some soundly guaranteeable and acceptably high probability floor)?
I agree you won’t get such a guarantee, just like we don’t have a guarantee that a LLM will learn grammar or syntax. What we can get is something that in practice works reliably. The reason I think it’s possible is that a corrigible and non-murderous AGI is a coherent target that we can aim at and that AIs already understand. That doesn’t mean we’re guaranteed success mind you but it seems pretty clearly possible to me.

307th 3 Sep 2025 14:59 UTC
5 points
0
in reply to: the gears to ascension’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
I actually doubt there are other general learning techniques out there in math space at all, because I think we’re already just doing “approximation of bayesian updating on circuits”
Interesting perspective! I think I agree with this in practice although not in theory (I imagine there are some other ways to make it work, I just think they’re very impractical compared to deep learning).
I don’t think I can make reliably true claims about anthropic’s effects with the amount of information I have, but their effects seem suspiciously business-success-seeking to me, in a way that seems like it isn’t prepared to overcome the financial incentives I think are what mostly kill us anyway.
Part of my frustration is that I agree there are tons of difficult pressures on people at frontier AI companies, and I think sometimes they bow to these pressures. They hedge about AI risk, they shortchange safety efforts, they unnecessarily encourage race dynamics. I view them as being in a vitally important and very difficult position where some mistakes are inevitable, and I view this as just another type of mistake that should be watched for and fixed.
But instead, these mistakes are used as just another rock to throw—any time they do something wrong, real or imagined, people use this as a black mark against them that proves they’re corrupt or evil. I think that’s both untrue and profoundly unhelpful.

307th 2 Sep 2025 13:36 UTC
64 points
30
on: Anthropic’s leading researchers acted as moderate accelerationists
I am in the camp that thinks that it is very good for people concerned about AI risk to be working at the frontier of development. I think it’s good to criticize frontier labs who care and pressure them but I really wish it wasn’t made with the unhelpful and untrue assertion that it would be better if Anthropic hadn’t been founded or supported.
The problem, as I argued in this post, is that people way overvalue accelerating timelines and seem willing to make tremendous sacrifices just to slow things down a small amount. If you advocate that people concerned about AI risk avoid working on AI capabilities, the first order effect of this is filtering AI capability researchers so that they care less about AI risk. Slowing progress down is a smaller, second order effect. But many people seem to take it for granted that completely ceding frontier AI work to people who don’t care about AI risk would be preferable because it would slow down timelines! This seems insane to me. How much time would possibly need to be saved for that to be worth it?
To try to get to our crux: I’ve found that caring significantly about accelerating timelines seems to hinge on a very particular view of alignment where pragmatic approaches by frontier labs are very unlikely to succeed, whereas some alternative theoretical work that is unrelated to modern AI has a high chance of success. I think we can see that here:
- I skip details of technical safety agendas because these carry little to no weight. As far as I see, there was no groundbreaking safety progress at or before Anthropic that can justify the speed-up that their researchers caused. I also think their minimum necessary aim is intractable (controlling ‘AGI’ enough, in time or ever, to stay safe^[4]).
I have the opposite view—successful alignment work is most likely to come out of people who work closely with cutting edge AI and who are using the modern deep learning paradigm. Because of this I think it’s great that so many leading AI companies care about AI risk, and I think we would be in a far worse spot if we were in a counterfactual world where OpenAI/DeepMind/Anthropic had never been founded and LLMs had (somehow) not been scaled up yet.

307th 19 Jun 2025 23:25 UTC
4 points
0
in reply to: Mass_Driver’s comment on: So You Want to Work at a Frontier AI Lab
By “it will look like normal deep learning work” I don’t mean it will be exactly the same as mainstream capabilities work—e.g. RLHF was both “normal deep learning work” and also notably different from all other RL at the time. Same goes for constitutional AI.
What seems promising to me is paying close attention to how we’re training the models and how they behave, thinking about their psychology and how the training influences that psychology, reasoning about how that will change in the next generation.
It seems odd and unlikely to me that the same kind of work (normal deep learning) that looks like it causes a series of major problems (power-seeking, black boxes, emergent goals) when you do a moderate amount of it would wind up solving all of those same problems when you do a lot of it, but I’m not enough of a technical expert to be sure that that’s wrong.
What are we comparing deep learning to here? Black box − 100% granted.
But for the other problems—power-seeking and emergent goals—I think they will be a problem with any AI system and in fact they are much better in deep learning than I would have expected. Deep learning is basically short sighted and interpolative rather than extrapolative, which means that when you train it on some set of goals, it by default tries to pursue those goals in a short sighted way that makes sense. If you train it on poorly formed goals, you can still get bad behaviour, and as it gets smarter we’ll have more issues, but LLMs are a very good base to start from—they’re highly capable, understand natural language, and aren’t power seeking.
In contrast, the doomed theoretical approaches I have in mind are things like provably safe AI. With these approaches you have two problems: 1), a whole new way of doing AI which won’t work, and 2), the theoretical advantage—that if you can precisely specify what your alignment target is, it will optimize for it—is in fact a terrible disadvantage, since you won’t be able to precisely specify your alignment target.
Because there are independent, non-technical reasons for people to want to believe that normal deep learning will solve alignment (it means they get to take fun, high-pay, high-status jobs at AI developers without feeling guilty about it)
This is what I mean about selective cynicism! I’ve heard the exact same argument about theoretical alignment work—“mainstream deep learning is very competitive and hard; alignment work means you get a fun nonprofit research job”—and I don’t find it convincing in either case.

307th 19 Jun 2025 22:13 UTC
3 points
0
in reply to: Mass_Driver’s comment on: So You Want to Work at a Frontier AI Lab
> In order to do useful superalignment research, I suspect you sometimes need to warn about or at least openly discuss the serious threats that are posed by increasingly advanced AI, but the business model of frontier labs depends on pretending that none of those threats are actually serious.
I think this is overly cynical. Demis Hassabis, Sam Altman, and Dario Amodei all signed the statement on AI risk:

”Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

They don’t talk about it all the time but if someone wants to discuss the serious threats internally, there is plenty of external precedent for them to do so.

> frontier labs are only pretending to try to solve alignment

This is probably the main driver of our disagreement. I think hands-off theoretical approaches are pretty much guaranteed to fail, and that successful alignment will look like normal deep learning work. I’d guess you feel the opposite (correct me if I’m wrong), which would explain why it looks to you like they aren’t really trying and it looks to me like they are.

307th 19 Jun 2025 20:53 UTC
4 points
0
in reply to: Mass_Driver’s comment on: So You Want to Work at a Frontier AI Lab
I think if you do concede that superalignment is tractable at a frontier lab, it is pretty clear that joining and working on alignment will have far more benefits than any speedup. You could construct probabilities such that that’s not true, I just don’t think those probabilities would be realistic.
I also think that people who argue against working in a frontier lab are burying the lede. It is often phrased as a common sense proposition anyone who agrees in the possibility of X-risk should agree with. Then you get into the discussion and it turns out that the entire argument is premised on extremely controversial priors that most people who believe in X-risk from AI do not agree with. I don’t mind debating those priors but it seems like a different conversation—rather than “don’t work at a frontier lab” your headline should be “frontier labs will fail at alignment while nonprofits can succeed, here’s why”.

307th 12 Jun 2025 13:15 UTC
−3 points
−10
in reply to: MondSemmel’s comment on: So You Want to Work at a Frontier AI Lab
The frontier labs have certainly succeeded at aligning their models. LLMs have achieved a level of alignment people wouldn’t have dreamed of 10 years ago.
Now labs are running into issues with the reasoning models, but this doesn’t at all seem insurmountable.

307th 12 Jun 2025 2:29 UTC
11 points
−4
on: So You Want to Work at a Frontier AI Lab
I really disagree with this piece and others like it. I think there’s a selectively applied fatalism about frontier labs that is entirely unwarranted. Some examples of this selective fatalism:

> Each lab’s emphasis on alignment varies, but none are on track to solve the hard problems, or to prevent these machines from growing irretrievably incompatible with human life.

The entire argument for avoiding frontier labs falls apart if you admit even a 20% likelihood that frontier labs will create aligned superintelligence, because that 20% likelihood implies that a motivated person joining could push it upwards, which would then be an incomprehensibly beneficial and heroic thing for that person to do.

> I don’t expect the marginal extra researcher to substantially improve these odds, even if they manage to resist the oppressive weight of subtle and unsubtle incentives.

Why not? And, why would they have to substantially improve these odds? Pushing the odds from 20% to 20.01% would be an incredible accomplishment for one person.

> The claim: Working within a lab can position a safety-conscious individual to influence the course of that lab’s decisions.
> My assessment: I admit I have a hard time steelmanning this case. It seems straightforwardly true that no individual entering the field right now will be meaningfully positioned to slow the development of superhuman AI from inside a lab.
A group is composed of people. The specific beliefs of the people in that group will be important for deciding what that group does.

If you shake off the fatalism and look at things clearly, you should realize: joining a frontier lab is an incredible opportunity to make things go better. If anyone has the skills to go for it I highly recommend they gather their courage and do so.

307th 9 Jan 2025 14:08 UTC
6 points
1
on: Activation space interpretability may be doomed
Nice post! I think these are good criticisms that don’t justify the title. Points 1 through 4 are all (specific, plausible) examples of ways we may interpret the activation space incorrectly. This is worth keeping in mind, and I agree that just looking at the activation space of a single layer isn’t enough, but it still seems like a very good place to start.
A layer’s activation is a relatively simple space, constructed by the model, that contains all the information that the model needs to make its prediction. This makes it a great place to look if you’re trying to understand how the model’s thinking.

307th 4 Nov 2023 14:24 UTC
18 points
15
in reply to: Ben Pace’s comment on: Integrity in AI Governance and Advocacy
There are all kinds of benefits to acting with good faith, and people should not feel licensed to abandon good faith dialogue just because they’re SUPER confident and this issue is REALLY IMPORTANT.

When something is really serious it becomes even more important to do boring +EV things like “remember that you can be wrong sometimes” and “don’t take people’s quotes out of context, misrepresent their position, and run smear campaigns on them; and definitely don’t make that your primary contribution to the conversation”.

Like, for Connor & people who support him (not saying this is you Ben): don’t you think it’s a little bit suspicious that you ended up in a place where you concluded that the very best use of your time in helping with AI risk was tweet-dunking and infighting among the AI safety community?

307th 25 Oct 2023 11:22 UTC
4 points
0
in reply to: Neel Nanda’s comment on: Lying is Cowardice, not Strategy
I don’t expect most people to agree with that point, but I do believe it. It ends up depending on a lot of premises, so expanding on my view there in full would be a whole post of its own. But to try to give a short version:

There are a lot of specific reasons I think having people working in AI capabilities is so strongly +EV. But I don’t expect people to agree with those specific views. The reason I think it’s obvious is that even when I make massive concessions to the anti-capabilities people, these organizations… still seem +EV? Let’s make a bunch of concessions:

1. Alignment will be solved by theoretical work unrelated to capabilities. It can be done just as well at an alignment-only organization with limited funding as it can at a major AGI org with far more funding.

2. If alignment is solved, that automatically means future ASI will be built using this alignment technique, regardless of whether leading AI orgs actually care about alignment at all. You just publish a paper saying “alignment solution, pls use this Meta” and Meta will definitely do it.

3. Alignment will take a significant amount of time—probably decades.

4. ASI is now imminent; these orgs have reduced timelines to ASI by 1-5 years.

5. Our best chance of survival is a total stop, which none of the CEOs of these orgs support.

Even given all five of these premises… Demis Hassabis, Dario Amodei, and Sam Altman have all increased the chance of a total stop, by a lot. By more than almost anyone else on the planet, in fact. Yes, even though they don’t think it’s a good idea right now and have said as much (I think? haven’t followed all of their statements on AI pause).

That is, the chance of a total stop is clearly higher in this world than in the counterfactual one where any of Demis/Dario/Sam didn’t go into AI capabilities, because a CEO of a leading AI organization saying “yeah I think AI could maybe kill us all” is something that by default would not happen. As I said before, most people in the field of AI don’t take AI risk seriously; this was even more true back when they first entered the field. The default scenario is one where people at NVIDIA and Google Brain and Meta are reassuring the public that AI risk isn’t real.

So in other words, they are still increasing our chances of survival, even under that incredibly uncharitable set of assumptions.

Of course, you could cook these assumptions even more in order to make them -EV—if you think that a total stop isn’t feasible, but still believe all of the other four premises, then they’re -EV. Or you could say “yeah, we need a total stop now, because they’ve advanced timelines, but if these orgs didn’t exist then we totally would have solved alignment before Meta made a big transformer model and trained it on a lot of text; so even though they’ve raised the chances of a total stop they’re still a net negative.” Or you could say “the real counterfactual about Sam Altman isn’t if he didn’t enter the field. The real counterfactual is the one where he totally agreed with all of my incredibly specific views and acted based on those.”

I.e. if you’re looking for excuses to be allowed to believe that these orgs are bad, you’ll find them. But that’s always the case. Under real worldviews—even under Connor’s worldview, where he thinks a total stop is both plausible and necessary—OAI/DM/Anthropic are all helping with AI risk. Which means that their beneficiality is incredibly robust, because again, I think many of the assumptions I outlined above are false & incredibly uncharitable to AGI orgs.

307th 24 Oct 2023 21:39 UTC
14 points
14
in reply to: rotatingpaguro’s comment on: Lying is Cowardice, not Strategy
Yeah, fair enough.

But I don’t think that would be a sensible position. The correct counterfactual is in fact the one where Google Brain, Meta, and NVIDIA led the field. Like, if DM + OpenAI + Anthropic didn’t exist—something he has publicly wished for—that is in fact the most likely situation we would find. We certainly wouldn’t find CEOs who advocate for a total stop on AI.

307th 24 Oct 2023 21:27 UTC
2 points
1
in reply to: 307th’s comment on: Lying is Cowardice, not Strategy
(Ninth, I am aware of the irony of calling for more civil discourse in a highly inflammatory comment. Mea culpa)

307th 24 Oct 2023 21:02 UTC
67 points
19
on: Lying is Cowardice, not Strategy
I believe you’re wrong on your model of AI risk and you have abandoned the niceness/civilization norms that act to protect you from the downside of having false beliefs and help you navigate your way out of them. When people explain why they disagree with you, you accuse them of lying for personal gain rather than introspect about their arguments deeply enough to get your way out of the hole you’re in.

First, this is a minor point where you’re wrong, but it’s also a sufficiently obvious point that it should hopefully make clear how wrong your world model is: AI safety community in general, and DeepMind + Anthropic + OpenAI in particular, have all made your job FAR easier. This should be extremely obvious upon reflection, so I’d like you to ask yourself how on earth you ever thought otherwise. CEOs of leading AI companies publicly acknowledging AI risk has been absolutely massive for public awareness of AI risk and its credibility. You regularly bring up how CEOs of leading AI companies acknowledge AI risk as a talking point, so I’d hope that on some level you’re aware that your success in public advocacy would be massively reduced in the counterfactual case where the leading AI orgs are Google Brain, Meta, and NVIDIA, and their leaders were saying “AI risk? Sounds like sci-fi nonsense!”

The fact that people disagree with your preferred method of reducing AI risk does not mean that they are EVIL LIARS who are MAKING YOUR JOB HARDER and DOOMING US ALL.

Second, the reason that a total stop is portrayed as an extreme position is because it is. You can think a total stop is correct while acknowledging that it is obviously an extreme course of action that would require TREMENDOUS international co-ordination and would have to last across multiple different governments. You would need both Republicans and Democrats in America behind it, because both will be in power across the duration of your indefinite stop, and ditto for the leadership of every other country. It would require military action to be taken against people who violate the agreement. This total stop would not just impact AI, because you would need insanely strong regulations on compute—it would impact everyone’s day to day life. The level of compute you’d have to restrict would only escalate as time went on due to Moore’s law. And you and others talk about carrying this on for decades. This is an incredibly extreme position that requires pretty much everyone in the world to agree AI risk is both real and imminent, which they don’t. Leading to...

Third: most people—both AI researchers and the general public—are not seriously concerned about AI risk. No, I don’t believe your handful of sketchy polls. On the research side, whether it’s on the machine learning subreddit, on ML specific discords, or within Yoshua Bengio’s own research organization^[1], the consensus in any area that isn’t specifically selected for worrying about AI risk is always that it’s not a serious concern. And on the public side, hopefully everyone realizes that awareness & agreement on AI risk is far below where climate change is.
Your advocacy regularly assumes that there is a broad consensus among both researchers and the public that AI risk is a serious concern. Which makes sense because this is the only way you can think a total stop is at all plausible. But bad news: there is nowhere close to such a consensus. And if you think developing one is important, you should wake up every morning & end every day praising Sam Altman, Dario Amodei, and Demis Hassabis for raising the profile of AI risk to such an extent; but instead you attack them, out of a misguided belief that somehow, if not for them, AI progress wouldn’t happen.

Which leads us to number four: No, you can’t get a total stop on AI progress through individual withdrawal. You and others in the stop AI movement regularly use the premise that if only OpenAI + Anthropic + DeepMind would just stop, AI would never get developed and we could all live happily ever after, so therefore they are KILLING US ALL.

This is false. Actually, there are many people and organizations that do not believe AI risk is a serious concern and only see AI as a technology with massive potential economic benefits; as long as this is the case, AI progress will continue. This is not a prisoner’s dilemma where if only all the people worried about AI risk would “co-operate” (by ceasing AI work) AI would stop. Even if they all stopped tomorrow, progress would continue.

If you want to say they should stop anyway because that would slow timelines, I would like to point out that that is completely different from a total stop and cannot be justified by praising the virtues of a total stop. Moreover, it has the absolutely massive drawback that now AI is getting built by a group of people who were selected for not caring about AI risk.

Advocating for individual withdrawal by talking about how good a total, globally agreed upon stop would be is deceptive—or, if I wanted to use your phrasing, I could say that doing so is LYING, presumably FOR PERSONAL GAIN and you’re going to GET US ALL KILLED you EVIL PERSON. Or I guess I could just not do all that and just explain why I disagree with you—I wonder which method is better?

Fifth, you can’t get a total stop on AI progress at all, and that’s why no one will advocate for one. This follows from points two and three and four. Even if somehow everyone agreed that AI risk was a serious issue a total stop would still not happen the same way that people believing in climate change did not cause us to abandon gasoline.
Sixth, if you want to advocate for a total stop, that’s your prerogative, but you don’t get to choose that that’s the only way. In theory there is nothing wrong with advocating for a total stop even though it is completely doomed. After all, nothing will come of it and maybe you’ll raise awareness of AI risk while you’re doing it.
The problem is that you are dead set on torching other alignment plans to the ground all for the sake of your nonworkable idea. Obviously you are going after AI capabilities people all the time but here you are also going against people who simply advocate for positions less stringent than you. Everyone needs to fall in line and advocate for your particular line of action that will never happen and if they don’t they are liars and going to kill us all. This is where your abdication from normal conversational norms makes your wrong beliefs actively harmful.
Leading to point number seven, we should talk about AI risk without constantly accusing each other of killing us all. What? But if I believe Connor’s actions are bad for AI risk surely that means I should be honest and say he’s killing us all, right? No, the same conversational norms that work for discussing a tax reform apply just as much here. You’re more likely to get a good tax reform if you talk it out in a civil manner, and the same goes for AI risk. I reject the idea that being hysterical and making drastic accusations actually helps things, I reject the idea that the long term thinking and planning that works best for literally every other issue suddenly has to be abandoned in AI risk because the stakes are so high, I reject the idea that the only possible solution is paralysis.

Eighth, yes, working in AI capabilities is absolutely a reasonable alignment plan that raises odds of success immensely. I know, you’re so overconfident on this point that even reading this will trigger you to dismiss my comment. And yet it’s still true—and what’s more, obviously so. I don’t know how you and others egged each other into the position that it doesn’t matter whether the people working on AI care about AI risk, but it’s insane.
1. ^
  From a recent interview:
  
  D’Agostino: How did your colleagues at Mila react to your reckoning about your life’s work?
  Bengio:The most frequent reaction here at Mila was from people who were mostly worried about the current harms of AI—issues related to discrimination and human rights. They were afraid that talking about these future, science-fiction-sounding risks would detract from the discussion of the injustice that is going on—the concentration of power and the lack of diversity and of voice for minorities or people in other countries that are on the receiving end of whatever we do.

307th 22 Oct 2023 15:03 UTC
34 points
7
on: AI Safety is Dropping the Ball on Clown Attacks, and Mind Control in General
This post is fun but I think it’s worth pointing out that basically nothing in it is true.

-”Clown attacks” are not a common or particularly effective form of persuasion
-They are certainly not a zero day exploit; having a low status person say X because you don’t want people to believe X has been available to humans for our entire evolutionary history
-Zero day exploits in general are not a thing you have to worry about; it isn’t an analogy that applies to humans because we’re far more robust than software. A zero day exploit on an operating system can give you total control of it; a ‘zero day exploit’ like junk food can make you consume 5% more calories per day than you otherwise would.
-AI companies have not devoted significant effort to human thought steering, unless you mean “try to drive engagement on a social media website”; they are too busy working on AI.
-AI companies are not going to try to weaponize “human thought steering” against AI safety
-Reading the sequences wouldn’t protect you from mind control if it did exist
-Attempts at manipulation certainly do exist but it will mostly be mass manipulation aimed at driving engagement and selling you things based off of your browser history, rather than a nefarious actor targeting AI safety in particular

307th 21 Oct 2023 17:29 UTC
3 points
0
in reply to: habryka’s comment on: I Would Have Solved Alignment, But I Was Worried That Would Advance Timelines
Seems like we mostly agree and our difference is based on timelines. I agree the effect is more of a long term one, although I wouldn’t say decades. OpenAI was founded in 2015 and raised the profile of AI risk in 2022, so in the counterfactual case where Sam Altman was dissuaded from founding OpenAI due to timeline concerns, AI risk would have much lower public credibility less than a decade.

Public recognition as a researcher does seem to favour longer periods of time though, the biggest names are all people who’ve been in the field multiple decades, so you have a point there.

307th 21 Oct 2023 17:01 UTC
1 point
2
in reply to: habryka’s comment on: I Would Have Solved Alignment, But I Was Worried That Would Advance Timelines
I think we’re talking past each other a bit. I’m saying that people sympathetic to AI risk will be discouraged from publishing AI capability work, and publishing AI capability work is exactly why Stuart Russell and Yoshua Bengio have credibility. Because publishing AI capability work is so strongly discouraged, any new professors of AI will to some degree be selected for not caring about AI risk, which was not the case when Russell or Bengio entered the field.