Chris_Leong

Karma: 7,693

Chris_Leong 4 Sep 2025 0:44 UTC
2 points
0
on: Scaling AI Safety in Europe: From Local Groups to International Coordination
Great post!
I really appreciate proposals that are both pragmatic and ambitious; and this post is both!
I guess the closest thing there is to a CEA for AI Safety is Kairos. However, they decided to focus explicitly on student groups^[1].
1. ^
  SPAR isn’t limited to students, but it is very much in line with this by providing, “research mentorship for early-career individuals in AI safety”.

Chris_Leong 18 Aug 2025 22:40 UTC
4 points
0
on: My Interview With Cade Metz on His Reporting About Lighthaven
I think he’s clearly had a narrative he wanted to spin and he’s being very defensive here.

If I wanted to steelman his position, I would do so as follows (low-confidence and written fairly quickly):
1. I expect he believes his framing and that he feels fairly confident in it because most of the people he respects also adopt this framing.
2. In so far as his own personal views make it into the article, I expect he believes that he’s engaging in a socially acceptable amount of editorializing. In fact, I expect he believes that editorializing the article in this way is more socially responsible than not, likely due to the role of journalism being something along the lines of “critiquing power”.
3. Further, whilst I expect he wouldn’t universally endorse “being socially acceptable among journalists” as guaranteeing that something is moral, he’d likely defend it as a strongly reliable heuristic, such that it would take pretty strong arguments to justify departing from this.
4. Whilst he likely endorses some degree of objectivity (in terms of getting facts correct), I expect that he also sees neutrality as overrated by old school journalists. I expect he believes that it limits the ability of jouralists to steer the world towards positive outcomes. That is, more of as a consideration that can be overriden, rather than a rule.

Chris_Leong 18 Aug 2025 22:24 UTC
1 point
0
in reply to: AnthonyC’s comment on: My Interview With Cade Metz on His Reporting About Lighthaven
I almost agreed voted this — then read the comments below — and disagreed voted this instead.

Chris_Leong 15 Aug 2025 0:08 UTC
3 points
1
on: Exploring the “Anti-TESCREAL” Ideology and the Roots of (Anti-)Progress
Fascinating work. I’m keen to hear more about the belief set of this opposing cluster.

Chris_Leong 14 Aug 2025 17:39 UTC
2 points
0
in reply to: JBlack’s comment on: Three Quotes on Transformative Technology
You’re misunderstanding the language game.

Chris_Leong 14 Aug 2025 16:43 UTC
2 points
0
on: Mech Interp Wiki Page and Why You Should Edit Wikipedia
Do you think Wiki pages might be less important with LLM’s these days? Also, I just don’t end up on Wiki pages as often, I’m wondering if Google stopped prioritizing it so heavily.

Chris_Leong 6 Aug 2025 13:57 UTC
LW: 2 AF: 1
0
AF
on: The Open Agency Model
Is there any chance you could define what you mean by “open agency”? Do you essentially mean “distributed agency”?

Chris_Leong 4 Aug 2025 20:56 UTC

2 points

on: Chris_Leong’s Shortform

Placeholder for an experimental art project — Under construction 🚧^[1]

Anything can be art, it might just be bad art — Millie Florence

Art in the Age of the Internet

The medium is the message — Marshall McLuhan, Media Theorist

Hypertext is not a technology, it is a way of thinking — ChatGPT 5^[2]

Writing is the process of reducing a tapestry of interconnections to a narrow sequence. This is, in a sense, illicit. This is a wrongful compression of what should spread out, and today’s computers, they’ve betrayed that — Ted Nelson, founder of Project Xanadu^[3]^[4]

𝕯𝖔𝖔𝖒 $^{؟}$

𝒽𝑜𝓌 𝓉𝑜 𝒷𝑒𝑔𝒾𝓃? 𝓌𝒽𝒶𝓉 𝒶𝒷𝑜𝓊𝓉 𝒶𝓉 𝕿𝖍𝖊 𝕰𝖓𝖉?^[5]

𝕿𝖍𝖊 𝕰𝖓𝖉? 𝕚𝕤 𝕚𝕥 𝕣𝕖𝕒𝕝𝕝𝕪 𝕿𝖍𝖊 𝕰𝖓𝖉?

𝓎𝑒𝓈. 𝒾𝓉 𝒾𝓈 𝕿𝖍𝖊 𝕰𝖓𝖉. 𝑜𝓇 𝓂𝒶𝓎𝒷𝑒 𝒯𝒽ℯ 𝐵ℯℊ𝒾𝓃𝓃𝒾𝓃ℊ.
𝓌𝒽𝒶𝓉𝑒𝓋𝑒𝓇 𝓉𝒽𝑒 𝒸𝒶𝓈𝑒, 𝒾𝓉 𝒾𝓈 𝒶𝓃 𝑒𝓃𝒹.^[6]

𝗠𝗶𝘁𝗶𝗴𝗮𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗿𝗶𝘀𝗸 𝗼𝗳 𝗲𝘅𝘁𝗶𝗻𝗰𝘁𝗶𝗼𝗻 𝗳𝗿𝗼𝗺 𝗔𝗜 𝘀𝗵𝗼𝘂𝗹𝗱 𝗯𝗲 𝗮 𝗴𝗹𝗼𝗯𝗮𝗹 𝗽𝗿𝗶𝗼𝗿𝗶𝘁𝘆 𝗮𝗹𝗼𝗻𝗴𝘀𝗶𝗱𝗲 𝗼𝘁𝗵𝗲𝗿 𝘀𝗼𝗰𝗶𝗲𝘁𝗮𝗹-𝘀𝗰𝗮𝗹𝗲 𝗿𝗶𝘀𝗸𝘀 𝘀𝘂𝗰𝗵 𝗮𝘀 𝗽𝗮𝗻𝗱𝗲𝗺𝗶𝗰𝘀 𝗮𝗻𝗱 𝗻𝘂𝗰𝗹𝗲𝗮𝗿 𝘄𝗮𝗿.

Geoffry Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, Dario Amodei, Bill Gates, Ily Sutskever…

There’s No Rule That Says We’ll Make It — Rob Miles

More

MIRI announces new “Death With Dignity” strategy, April 2nd, 2022

Well, let’s be frank here. MIRI didn’t solve AGI alignment and at least knows that it didn’t. Paul Christiano’s incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world. Chris Olah’s transparency work, at current rates of progress, will at best let somebody at DeepMind give a highly speculative warning about how the current set of enormous inscrutable tensors, inside a system that was recompiled three weeks ago and has now been training by gradient descent for 20 days, might possibly be planning to start trying to deceive its operators.

Management will then ask what they’re supposed to do about that.

Whoever detected the warning sign will say that there isn’t anything known they can do about that. Just because you can see the system might be planning to kill you, doesn’t mean that there’s any known way to build a system that won’t do that. Management will then decide not to shut down the project—because it’s not certain that the intention was really there or that the AGI will really follow through, because other AGI projects are hard on their heels, because if all those gloomy prophecies are true then there’s nothing anybody can do about it anyways. Pretty soon that troublesome error signal will vanish.

When Earth’s prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.

That’s why I would suggest reframing the problem—especially on an emotional level—to helping humanity die with dignity, or rather, since even this goal is realistically unattainable at this point, die with slightly more dignity than would otherwise be counterfactually obtained...

Three Quotes on Transformative Technology

But the moral considerations, Doctor...

Did you and the other scientists not stop to consider the implications of what you were creating? — Roger Robb

When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. That is the way it was with the atomic bomb— Oppenheimer

❦

There are moments in the history of science, where you have a group of scientists look at their creation and just say, you know: ’What have we done?… Maybe it’s great, maybe it’s bad, but what have we done? — Sam Altman

❦

Urgent: get collectively wiser—Yoshua Bengio, AI “Godfather”

✒️ Selected Quotes:

We stand at a crucial moment in the history of our species. Fueled by technological progress, our power has grown so great that for the first time in humanity’s long history, we have the capacity to destroy ourselves—severing our entire future and everything we could become.

Yet humanity’s wisdom has grown only falteringly, if at all, and lags dangerously behind. Humanity lacks the maturity, coordination and foresight necessary to avoid making mistakes from which we could never recover. As the gap between our power and our wisdom grows, our future is subject to an ever-increasing level of risk. This situation is unsustainable. So over the next few centuries, humanity will be tested: it will either act decisively to protect itself and its long-term potential, or, in all likelihood, this will be lost forever — Toby Ord, The Precipice

We have created a Star Wars civilization, with Stone Age emotions, medieval institutions, and godlike technology — Edward O. Wilson, The Social Conquest of Earth

❦

Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct — Nick Bostrom, Founder of the Future of Humanity Institute, Superintelligence

❦

If we continue to accumulate only power and not wisdom, we will surely destroy ourselves — Carl Sagan, Pale Blue Dot

Never has humanity had such power over itself, yet nothing ensures that it will be used wisely, particularly when we consider how it is currently being used…There is a tendency to believe that every increase in power means “an increase of ‘progress’ itself ”, an advance in “security, usefulness, welfare and vigour; …an assimilation of new values into the stream of culture”, as if reality, goodness and truth automatically flow from technological and economic power as such. — Pope Francis, Laudato si’

❦

The fundamental test is how wisely we will guide this transformation – how we minimize the risks and maximize the potential for good — António Guterres, Secretary-General of the United Nations

❦

Our future is a race between the growing power of our technology and the wisdom with which we use it. Let’s make sure that wisdom wins — Stephen Hawking, Brief Answers to the Big Questions

❦

A digital painting of a toddler sitting in a sandbox at dusk, wearing a crumpled paper crown and holding a glowing scepter topped with a miniature Earth. The child looks fascinated and innocent. Surrounding them are plastic toy tanks, a rocket, a dump truck, and a small shovel, softly illuminated by the warm glow of the scepter. The background is dark, evoking a sense of quiet gravity.

❤️‍🔥 Desires

𝓈𝑜𝓂𝑒𝓉𝒾𝓂𝑒𝓈 𝐼 𝒿𝓊𝓈𝓉 𝓌𝒶𝓃𝓉 𝓉𝑜 𝓂𝒶𝓀ℯ 𝒜𝓇𝓉

𝕥𝕙𝕖𝕟 𝕞𝕒𝕜𝕖 𝕚𝕥

𝒷𝓊𝓉 𝓉𝒽ℯ 𝓌𝑜𝓇𝓁𝒹 𝒩𝐸𝐸𝒟𝒮 𝒮𝒶𝓋𝒾𝓃ℊ...

𝕪𝕠𝕦 𝕔𝕒𝕟 𝓈𝒶𝓋ℯ 𝕚𝕥?

𝐼… 𝐼 𝒸𝒶𝓃 𝒯𝓇𝓎...

Effective altruism in the garden of ends

No – I will eat, sleep, and drink well to feel alive; so too will I love and dance as well as help.

Hope

❦

Scraps

Ilya Sutskever

“It had taken Sutskever years to be able to put his finger on Altman’s pattern of behavior—how OpenAI’s CEO would tell him one thing, then say another and act as if the difference was an accident. ‘Oh, I must have misspoken,’ Altman would say. Sutskever felt that Altman was dishonest and causing chaos, which would be a problem for any CEO, but especially for one in charge of such potentially civilization-altering technology.”

The Optimist, Keach Hagey

Ilya Sutskever, once widely regarded as perhaps the most brilliant mind at OpenAI, voted in his capacity as a board member last November to remove Sam Altman as CEO. The move was unsuccessful, in part because Sutskever reportedly bowed to pressure from his colleagues and reversed his vote. After those fateful events, Sutskever disappeared from OpenAI’s offices so noticeably that memes began circulating online asking what had happened to him. Finally, in May, Sutskever announced he had stepped down from the company.

Time 100 AI 2024

Twitter

We approach safety and capabilities in tandem, as technical problems to be solved through revolutionary engineering and scientific breakthroughs. We plan to advance capabilities as fast as possible while making sure our safety always remains ahead.
This way, we can scale in peace.
Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures.

Safe Superintelligence Inc.

^
⇢ Note to self: My previous project had too much meta-commetary and this may have undermined the sincerity, so I should probably try to minimise meta-commentary.

⇢ “You’re going to remove this in the final version, right?” — Maybe.
^
“But you can’t quote ChatGPT 😠!”—Internet Troll ÷
^
“I would say the flaw of Xanadu’s UI was treating transclusion as ‘horizontal’ and side-by-side” — Gwern 🙃
^
“StretchText is a hypertext feature that has not gained mass adoption in systems like the World Wide Web… StretchText is similar to outlining, however instead of drilling down lists to greater detail, the current node is replaced with a newer node”—Wikipedia
This ‘stretching’ to increase the amount of writing, or contracting to decrease it gives the feature its name. This is analogous to zooming in to get more detail.

Ted Nelson coined the term c. 1967.

Conceptually, StretchText is similar to existing hypertexts system where a link provides a more descriptive or exhaustive explanation of something, but there is a key difference between a link and a piece of StretchText. A link completely replaces the current piece of hypertext with the destination, whereas StretchText expands or contracts the content in place. Thus, the existing hypertext serves as context.
⇢ “This isn’t a proper implementation of StretchText” — Indeed.
^
In defence of Natural Language DSLs — Connor Leahy
^
Did this conversation really happen? — 穆
^
⇢ “Sooner or later, everything old is new again” — Stephen King
⇢ “Therefore if any man be in Christ, he is a new creature: old things are passed away; behold, all things have become new.” — 2 Corinthians 5:17

Chris_Leong 4 Aug 2025 19:54 UTC
4 points
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
Redirect the search?

You mean retarget the search as per John Wentworth’s proposal?

Chris_Leong 4 Aug 2025 14:15 UTC
3 points
1
in reply to: cousin_it’s comment on: Saying Goodbye
Big actors already have every advantage, why wouldn’t they be able to defend themselves?

I’m worried that the offense-defense balance leans strongly towards the attacker. What are your thoughts here?

Chris_Leong 4 Aug 2025 9:19 UTC
15 points
12
in reply to: RedMan’s comment on: Saying Goodbye
I agree, this is the obvious solution… as long as you put your hands in your ears and I shout “I can’t hear you, I can’t hear you” whenever the topic of misuse risks comes up...

Otherwise, there are some quite thorny problem. Maybe you’re ultimately correct about open source being the path forward, but it’s far from obvious.

Three Quotes on Transformative Technology

Chris_Leong1 Aug 2025 22:57 UTC

8 points

3 comments1 min readLW link

Chris_Leong 31 Jul 2025 0:06 UTC
2 points
0
in reply to: habryka’s comment on: Chris_Leong’s Shortform
What is the SUV Triad?

Sorry, this is some content that I had in my short-form Why the focus on wise AI advisors?. The SUV Triad is described there.
I was persuaded by Professor David Manly that I didn’t need to argue for Disaster-By-Default in order to justify wise AI advisors and that focusing too much on this aspect would simply cause me to lose people, so I needed somewhere to paste this content.
I just clicked “Remove from Frontpage”. I’m unsure if it does anything for short-form posts though.
Also, the formatting on this is wild, what’s the context for that?
Just experimenting to see what’s possible. Copied it directly from that post, haven’t had time to rethink the formatting yet now that it is its own post. Nowhere near as wild as it gets in the main post though!

Chris_Leong 30 Jul 2025 23:26 UTC

−5 points

on: Chris_Leong’s Shortform

😱✂️💣💣💣💣💣 𝙳 𝙸 𝚂 𝙰 𝚂 𝚃 𝙴 𝚁 - 𝙱 𝚈 - 𝙳 𝙴 𝙵 𝙰 𝚄 𝙻 𝚃 ? - Public Draft

This is a draft post to hold my thoughts on Disaster-By-Default.

I have an intuition that either the SUV Triad can be turned into an argument for Disaster-By-Default and so I created this post to explore this possibility.

However, I consider this post experimental in that it may not pan out.

☞ The　𝙳 𝙸 𝚂 𝙰 𝚂 𝚃 𝙴 𝚁 - 𝙱 𝚈 - 𝙳 𝙴 𝙵 𝙰 𝚄 𝙻 𝚃　hypothesis:

AGI leads to some kind of societal scale catastrophe by default

Clarification: This isn’t a claim that it wouldn’t be possible to avoid this fate if humanity decided to wake up and decide it’s serious about winning. This is just a claim about what happens by default.

Why might this be true?

Recap — 𝚃 𝙷 𝙴 🅂🅄🅅 𝚃 𝚁 𝙸 𝙰 𝙳 — Old version, to be updated to the latest version just before release

For convenience, I’ve copied the description of the SUV Triad from my post Why the focus on wise AI advisors?

Covered in reverse order:

🅅 𝚄 𝙻 𝙽 𝙴 𝚁 𝙰 𝙱 𝙸 𝙻 𝙸 𝚃 𝚈 – 🌊🚣:

✷ The development of advanced AI technologies will have a massive impact on society given the essentially infinite ways to deploy such a general technology. There are lots of ways this could go well, and lots of ways w~~e all die~~ this could go extremely poorly.

Catastrophic malfunctions, permanent dictatorships, AI-designed pandemics, mass cyberattacks, information warfare, war machines lacking any mercy, the list goes on...

Worse: Someone Will Always Invent A New Way That Disaster Could Strike ... ^[44]^[45]

The kicker: Interaction effects. Technological unemployment breaking politics. Loss of control + automated warfare. AI-enabled theft of specialised biological models...

In more detail...

i) At this stage I’m not claiming any particular timelines.

I believe it’s likely to be ~~absurdly~~ quite fast, but I don’t claim this until we get to 🅂𝙿 𝙴 𝙴 𝙳 😅⏳.

I suspect that often when people doubt this claim, they’ve implicitly assumed that I was talking about the short or medium term, rather than the long term 🤔. After all, the claim that there are many ways that AI could plausible lead to dramatic benefits or harms over the next 50 or 100 years feels like an extremely robust claim. There are many things that a true artificial general intelligence could do. It’s mainly just a question of how long it takes to develop the technology.

ii) It’s quite likely that at least some of these threats will turn out to be overhyped. That doesn’t defeat this argument! Even in the unlikely event that most of these threats turned out to be paper tigers, as claimed in The Kicker, a single one of these threats going through could cause absurd amounts of damage.

iii) TODO

🅄 𝙽 𝙲 𝙴 𝚁 𝚃 𝙰 𝙸 𝙽 𝚃 𝚈 – 🌅💥:

✷ We have massive disagreement on what we expect the development of AI, let alone the best strategy^[46]. Making the wrong call could prove catastrophic.

Strategically: Accelerate or pause? What’s the offence-defence balance? Do we need global unity or to win the arms race? Who (or what) should we be aligning AIs to? Can AI “do our alignment homework” for us?

Situationally^[47]: Will LLM’s scale AGI or are they merely a dead-end? Should we expect AGI in 2027 or is it more than a decade away? Will AI create jobs and cure economic malaise or will it crush, destroy and obliviate them? Will the masses embrace as the key to their salvation or call for a Butlerian Jihad?

The kicker: We’re facing these issues at an extremly bad time – when both trust and society’s epistemic infrastructure is crumbling. Even if our task were epistemically easy, we might still fail.

In more detail...

i) A lot of this uncertainty just seems inherently really hard to resolve. Predicting the future is hard.

ii) However hard this is to resolve in theory, it’s worse in practise. Instead of an objective search for the truth, these discussions are distorted by all these different factors including money, social status and the need for meaning.

iii) More on the kicker: We’re seeing increasing polarisation, less trust in media and experts^[48] and AI stands to make this worse. This is not where we want to be starting from and who knows how long this might take to resolve?

🅂 𝙿 𝙴 𝙴 𝙳 – 😅⏳:

✷ AI Is developing incredibly rapidly… We have limited time to act and to figure out how to act.^[49].

No wall! See o3 crushing the ARC challenge, the automation of expontentially longer coding tasks^[50], LLMs demonstrating IMO gold medal maths ability^[51], widespread and rapid benchmark saturation, the Turing Test being passed^[52] and society shrugging its shoulders (‼️)

Worse: We may already be witnessing an AI arms race. Stopping this may simply not be possible – forget about AGI, an inconcievably large prize – many see winning as a matter of survival.

The kicker: It’s entirely plausible that as the stakes increase AI actually accelerates. That we look back at the current rate of progress and laugh about how we used to consider it fast.

In more detail...

i) The speed at which things are happening makes the problem much harder. Humanity does have the ability to deal with civilisational scale challenges. It’s not easy—global co-ordination is incredibly difficult to achieve—but it’s possible. However, one at a time is a lot easier than dozens. When this happens, its hard to give each problem the proper attention it deserves 😅🌫️💣💣💣

ii) Even if timelines aren’t short, we might still be in trouble if the take-off speed is fast. Unfortunately, humanity is not very good at preparing for abstract, speculative-seeming threats ahead of time.

iii) Even if neither timelines nor take-off speeds are fast in an absolute sense, we might still expect disaster if they are fast in a relative sense. Governance—especially global governance—tends to proceed rather slowly. Even though it can happen much faster when there’s a crisis, sometimes problems need to be solved ahead of time and once you’re in them it’s too late. As an example, once an AI induced pandemic is spreading, you may have already lost.

I believe that the following reflections provide a strong, but defeasible reason to believe the Disaster-By-Default hypothesis.

Recap: Reflections on The SUV Triad — Old version, to be updated to the latest version just before release

Reflections—Why the SUV Triad is Fucking Scary

Many of the threats constitute civilisational-level risk by themselves. We could successful navigate all the other threats, but simply drop the ball once and all that could be for naught.
Even if a threat can’t led to catastrophe, it can still distract us from those that can. It’s hard to avoid catastrophe when we don’t know where to focus our efforts ⚪🪙⚫.
The speed of development makes this much harder. Even if alignment were easy and governance didn’t require anything special, we could still fail because certain people have decided that we have to race toward AGI as fast as possible.

Controversial: It may even present a reason to expect Disaster-By-Default (draft post) (‼️).

That said, I want to explore making a more rigorous argument. Here’s the general intuition:

That said, I want to try to see if it’s possible to make a more rigorous argument.

Consider the following model. I wonder whether it would be applicable to the SUV Triad:

Suppose we had a series of ten independent decisions and that we have to get each of them right in order to survive. Assume each is a binary choice. Further assume that these choices are quite confusing so we only have a 60% chance of getting each right. Then we’d only have a 0.01% chance of survival.
The question I want to explore is whether it would be appropriate to model the SUV Triad with something along these lines.
That said, even if it made sense from an inside view perspective, we’d still have to adjust for the outside view. The hardness of each decision is not independent, but most likely tied to some general factors such as: overall problem difficulty, general societal competence, the speed at which we have to face these problems. In other words, treating each probability as independent, likely makes the probability look more extreme.

So the first step would be figuring out whether this model is applicable (not in an absolute sense, but whether or not it’d be appropriate for making a defeasible claim.

How could we evaluate this claim? Here’s one such method:

Define a threshold of catastrophe
Make a list of threats that could meet that bar
Make a list of key choices that we’d have to make in relation to these threats and where making the wrong choice could prove catastrophic
Estimate the difficulty of each choice
Consider the degree of correlation between various threats/choices
Potentially: estimate how many additions such choices there may be such that we’ve missed

This model could fail if the number of key choices wasn’t that large, these choices weren’t hard or due to correlation.

And if the model were applicable, then we’d have to consider possible defeaters:

“There’s too much uncertainty here” - <todo reponse>
“Humanity has overcome significant challenges in the past” - <todo reponse>
“There will almost certainly be a smaller wake-up call before a bigger disaster” - <todo response>

In more detail/further objections...

I suggest ignoring for now. Copied from a different context, so needs to be adapted:

Most notable counterargument: “We most likely encounter smaller wake up calls first. Society wakes up by default”.

Rich and powerful actors will be incentivised to use their influence to downplay the role of AI in any such incidents, argue that we should focus solely on that threat model or even assert that that further accelerating capabilities is the best defense. Worse, we’ll likely be in the middle of a US-China arms race where there’s national security issues at play that could make slowing things down feel almost inconcievable.

Maybe there is eventually an incident that is too serious to ignore, but by then it will probably be too late. Capabilities increase fast and we should expect a major overhang of elicitable capabilties, so we would need to trigger a stop significantly before threshold of dangerous capabilities.

“But the AI industry doesn’t want to destroy society. They’re in society” — Look at what happened with “gain of function” research. If it had been prominently accepted that gain of function is bad, then that would have caused a massive loss of status for medical researchers, so they didn’t allow that to happend. The same incentives apply to AI developers.

“Open source/weights models are behind the frontier and it’s possible that society will enforce restrictions on them, even if it’ll be impossible to prevent closed source development from continuing” — Not that far behind and attempting to restrict open-source models will result in massive pushback/subversion. There’s a large community dedicated to open source software, for some it’s essentially a substitute for a religion, for others it’s the basis of their company or their national competitiveness. Even if the entire UN security council agreed, they couldn’t just say, “Stop!” and expect it to be instantly obeyed. Our default expectation should be that capabilities broadly proliferate.

Second most notable counterargument: “AI is aligned by default”

This looked much more plausible before inference time compute took off.

Third most notable counterargument: “We’ve overcome challenges in the past, we should expect that we most likely stumble through”

AI is unique (generality, speed of development, proliferation).

Much more plausible if there was a narrower threat model or if development moved more slowly.

What links here?

Chris_Leong's comment on Chris_Leong’s Shortform by Chris_Leong (10 Feb 2025 16:54 UTC; 9 points)

Chris_Leong 29 Jul 2025 23:03 UTC
2 points
0
in reply to: DresdenHeart’s comment on: DresdenHeart’s Shortform
Yeah, that’s unfortunate.

Catastrophic concerns are slowly becoming more normalised over time. We’ve already made a lot of progress compared to where we used to be and I expect this to continue. This just requires people to keep going anyway.

Chris_Leong 27 Jul 2025 2:04 UTC
0 points
0
in reply to: Chris_Leong’s comment on: Chris_Leong’s Shortform

Chris_Leong 27 Jul 2025 1:55 UTC
0 points
0
in reply to: Chris_Leong’s comment on: Chris_Leong’s Shortform
With quotes from: Yoshua Bengio, Oppenheimer, Toby Ord, Edward Wilson, Nick Bostrom, Carl Sagan, Pope Francis, Antonio Guterres, Stephen Hawking

Chris_Leong 27 Jul 2025 1:54 UTC
0 points
0
on: Chris_Leong’s Shortform
Parent comment for: Why the focus on wise AI advisors?

Chris_Leong 14 Jul 2025 21:54 UTC
4 points
0
in reply to: Gordon Seidoh Worley’s comment on: On actually taking expressions literally: tension as the key to meditation?
Fascinating, but I’m slightly confused about the link here. Any chance you could make the connection more explicit?

Chris_Leong 14 Jul 2025 21:53 UTC
2 points
0
in reply to: Nick_Tarleton’s comment on: On actually taking expressions literally: tension as the key to meditation?
… whilst also creating incredible amounts of stress/trauma at the same time that would wash it any such effect.

(Fascinating theory though)

Chris_Leong

Placeholder for an experimental art project — Under construction 🚧[1]

Ilya: The AI scientist shaping the world

There’s No Rule That Says We’ll Make It — Rob Miles

MIRI announces new “Death With Dignity” strategy, April 2nd, 2022

Three Quotes on Transformative Technology

Hope

Scraps

Three Quotes on Trans­for­ma­tive Technology

Why might this be true?

Placeholder for an experimental art project — Under construction 🚧^[1]

Three Quotes on Transformative Technology