I upvoted. While I disagree with most of the reasoning, it seems relatively clear to me that going against community opinion is the main reason for the downvotes. Consider this: If an author well known for his work in forecasting had pregresitered that he was going to write a bunch of not fully fleshed out arguments in favor or against updating in a particular direction, most people here would be encouraging of publishing it. I dont think there has ever been a consistent standard for “only publish highly thought out arguments” here, and we should not engage in isolated demands for rigor here, even if the topic is somewhat dicey.
MichaelLowe
This conversation uses “underdog” in different ways, giving rise to confusion. Yes, the point of an underdog story is indeed that the underdog wins, but this just makes the heros of the story just more awesome. Ultimately, you emphasize with somebody who is super strong.
The OP, however, describes a phenomenon where the groups see themselves as weaker and in fact unlikely to win. cousin_it attributes this to weakness being desirable due to Christianity. Socrates is a good counterexample, but the 300 are less so.
It is unclear to me that the described phenomenon exists to the degree assumed. If two equally powerful countries or sports teams battle each other, each group of supporters will believe they are likelier to win on average.
Thanks for posting! Which model was used for this eval? gpt-5-thinking or gpt-thinking-high, or any other? I think it could be good to specify (or update) for future evaluation reports
It’s not a defined sentence to say, “everyone has equal moral worth”
make sure they can construct their ideology
These seem like excessive and unusual demands in the context of such a discussion. I concede there is some argument to be had for defining the terms since they are inherently vague, but this is not a philosophical treatise where that feels more appropriate. This feels similar to how in arguments about AGI some folks argue that you should not use the word intelligence (as in intelligence explosigion) since it is undefined. Moral worth, just as intelligence, seems like a useful enough concept to apply without needing to define it. To wit, John originally indicated disgust at people less ambitious than him and used the words “so-called fellow humans”, and at that depth of analysis it feels congruent for other people to intuit that he assigns less moral worth to these people and vaguely argue against it.
Great study!
A strong motivating aspect of the study is measuring AI R&D accleration. I am somewhat wary of using this methodology to find negative evidence for this kind of acceleration happening at labs:
I must believe that using AI agents productively is a skill question, despite the graphs in the paper showing no learning effects. One kind of company filled with people knowing lots about how to prompt AIs, and their limitations, are AI labs. Even if most developers
The mean speedup/slowdown can be a difficult metric: the heavy tail of research impact + feedback loops around AI R&D make it so that just one subgroup with high positive speedup could have a big impact.
Reading a recent account of an employee who left OpenAI, the dev experience also sounds pretty dissimilar. Summarizing, OAI repos are large (matches the study setting), but people don’t have a great understanding of the full repo (since it’s a large monorepo+ a lot of new people joining) and there do not seem to be uniform code guidelines.
I expect this to be a good but not perfect analogy to how an AI related catastrophic event could trigger political change. My understanding is that a crucial part of public discourse was, as other commenters allude to, a perceived taboo against being anti-war, such that even center-left reputable mainstream sources did not in fact doubt the evidence for Iraq’s alleged WMD. Likely a crucial component is a sort of moral dimension to the debate (“are you suggesting we should not do anything about 9/11?”) that prevents people from speaking out.
I expect an AI related fiasco to have less of this moral load, and instead think that scenarios like the Whenzhou train accident or the bridge collapse in Italy 2018 are more analogous in that the catastrophe is a clear accident, that while perhaps caused by recklessness was not caused by a clearly evil entity. The wiki article on the bridge collapse makes it sound like in the aftermath there was a lot of blaming going on, but no mention of any effort to invest more into infrastructure.
Congratulations on the first video! The production quality is really impressive.
One challenge with broader public communication is that, as you allude in the post, most people have not been exposed to much of AI discourse in the way that e.g. members of this forum or 80k employees have. Do you have any approach to figuring out how to most effectively communicate with such an audience, such as what background level to assume, how doom-y to sound and so on?
I don’t see the awfulness, although tbh I have not read the original reactions. If you are not desensitized to what this community woudl consider irresponsible AI development speed, responding with “You are building and releasing an AI that can do THAT?!” rather understandable. It is relatively unfortunate that it is the safety testing people that get the flack (if this impression is accurate) though.
This is a good post, but it applies unrealistic standards and therefore draws too strong conclusions.
>And at least OpenAI and Anthropic have been caught lying about their motivations:
Just face it: It is very normal for big companies to lie. That does make many of their press and public facing statements not trustworthy, but is not predictive of their general value system and therefore actions. Plus Anthropic, unlike most labs, did in fact support a version of SB 1047 at all. That has to count for something.
>There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”.
In a similar vein, humans do not act or feel rationally in light of their beliefs, and changing your behavior completely in response to a years off event is just not in the cards for the vast majority of folks. Therefore do not be surprised that there is a missing mood, just like it is not surprising that people who genuinely believe in the end of humanity due to climate change do not adjust their behavior accordingly. Having said that, I did sense a general increase and preponderance of anxiety when o3 was announced, perhaps that was a point where it started to feel real for many folks.
Either way, I really want to stress that concluding much about the beliefs of folks based on these reactions is very tenuous, just like concluding that a researcher must not really care about AI safety because instead of working a bit more they watch some TV in the evening.
For the sake of correctness/completeness: The chemical compound purchase was not done by ARC, but by another unspecified red-team.
There are now enough cases in Europe to get a faint idea of local transmission dynamics. Of the 11 known cases so far in Germany, only one has transmitted the virus. Israel has 4 cases, with one local transmission attributable to them. Portugal has 13 cases, with 12 of those being local transmission all attributable to the same index case.
If we can collect similar information from other countries, we could get reasonable estimates of R_eff.
Source for Germany: https://docs.google.com/spreadsheets/u/0/d/1BA2GoeVMhC_dCcnl5qtR-fxpCwVH6xu8T3LxHUss1Gw/htmlview
This looks exciting! I wonder about the proposed training setup: If one model produces the thoughts, and another one takes those as input to the prompts, are we actually learning anything about the internal state of either model? What is the advantage (beyond scalability) of this training setup vs just using the second model to produce continuations conditional on thoughts?
For an unreasonable narrow interpretation that only counts those for whom the medicine was already sitting in a warehouse waiting for approval, and treat that shortage as a ‘whoops, making things is hard and takes time’ rather than a directly caused effect, the FDA is going to directly murder about 20,000 people in the United States.
I disagree with this, in that my lowest count on FDA related deaths is approximately zero, or less than 100, for the exact reason that Gurkenglas mentioned below. The post already recognizes that there is a manufacturing bottleneck, so if we assume that the FDA approval process has not caused the bottleneck, the 180 000 pills available by the end of the year will be given out later, with an admittedly non zero loss of efficiency.
Why should we assume that the FDA approval has not caused that bottleneck? Because we should assume that if a prediction market knows the drug is getting approval, Pfizer knows it as well, and will manufacture as quickly as possible once they know the efficacy numbers. Sells of the drug will be supply, and not demand constrained, even through 2022. 50 million doses (the projected capacity for 2022) is not enough to cover the developed world.
Weak evidence: Molnupiravir was approved in the UK earlier this month, but is still not available. I am not certain whether that is entirely due UK’s decision to further test it in a trial for vaccinated people (which indeed will cost many lives).
Caveats: I am not saying that the FDA would act differently if there was no manufacturing bottleneck, just that in this case the slow FDA decision is much smaller.
Pfizer might also have some uncertainty because they do not know whether the FDA will approve the use of the drug for vaccinated people, which indeed might reduce upfront investment. But that is not directly related to the delay of the approval in itself.
Seems like Austria quickly acquiesced to your viewpoint, today they announced mandatory vaccination starting February, and in the meantime a lockdown for everybody. Personally, I would be fairly disappointed in their legal system if mandatory vaccination is allowed to stand, as the more sensible solution (mandatory vaccination and boosters for 65+ like France) would do the trick as well.
https://www.politico.eu/article/austria-mandatory-coronavirus-vaccination-february/
Yes, absolutely. But that is not my definition, just the one that (as I understand it) DiAngelo gives.
I would argue that DiAngelo’s and the progressive left definition of racism is not congruent and contradictory. On the one hand, it is defined by consequences alone : “Beliefs and actions are racist if they lead to minorities continued disadvantage compared to Whites.” Regardless of the connotation and baggage of the word, this is a useful concept.
However, this also means that pretty much everything you do is racist if you actually follow the definition: You do not want to attend a diversity seminar, forget about race and just do your work? By not addressing racist structures, you are enforcing them, and that is therefore racist. You merely want to read a fantasy novel before going to bed? Well, that keeps society the way it is, and therefore contributes to racism. Sounds extreme, but I contend that this is the logical consequence of that definition. And as an aside, a white CEO publicly using the n-word, and thereby being fired and replaced by a person of color, would not be racist by that definition.
Hey, we sent out our first batch of responses on Friday, could you kindly check your spam folder?
Hey, we sent out our first batch of responses on Friday, could you kindly check your spam folder?
Thanks for publishing this!
My main disagreement is about a missing consideration: Shrinking time to get alignment right. Despite us finding out that frontier models are less misaligned by default than [1]most here would have predicted, the bigger problem to me is that we have made only barely progress about crossing the remaining alignment gap. As a concrete example: LLMs will in conversation display a great understanding and agreement with human values, but in agentic settings (Claude 4 system card examples of blackmail) act quite differently. More importantly on the research side: to my knowledge, there has neither been a recognized breakthrough nor generally recognized smooth progress towards actually getting values into LLMs.
Similarly, at least for me a top consideration that AFAICT is not in your list: the geopolitical move towards right-wing populism (particularly in the USA) seems to reduce the chances of sensible governance quite severely.
This seems basically true to me if we are comparing against early 2025 vibes, but not against e.g. 2023 vibes (“I think vibes-wise I am a bit less worried about AI than I was a couple of years ago”). Hard to provide evidence for this, but I’d gesture at the relatively smooth progress between the release of ChatGPT and now, which I’d summarize as “AI is not hitting a wall, at the very most a little speedbump”.
This is an interesting angle, and feels important. The baseline prior should imo be: governing more entities with near 100% effectiveness is harder than governing fewer. While I agree that conditional on having lots of companies it is likelier that some governance structure exists, it seems that the primary question is whether we get a close to zero miss rate for “deploying dangerous AGI”. And that seems much harder to do when you have 20 to 30 companies that are in a race dynamic, rather than 3. Having said that, I agree with your other point about AI infrastructure becoming really expensive and that the exact implications are poorly understood.
I think about two/thirds of this perceived effect are due to LLMs not having much goals at all rather than them having human compatible goals.