Do you think maybe rationalists are spending too much effort attempting to saturate the dialogue tree (probably not effective at winning people over) versus improving the presentation of the core argument for an AI moratorium?
Smart people don’t want to see the 1000th response on whether AI actually could kill everyone. At this point we’re convinced. Admittedly, not literally all of us, but those of us who are not yet convinced are not going to become suddenly enlightened by Yudkowsky’s x.com response to some particularly moronic variation of an objection he already responded to 20 years ago (Why does he do this? does he think has any kind of positive impact?)
A much better use of time would be to work on an article which presents the solid version of the argument for an AI moratorium. I.e., not an introductory text or article in Time Magazine, and not an article targeted to people he clearly thinks are just extremely stupid relative to him so rants for 10,000 words trying to drive home a relatively simple point. But rather an argument in a format that doesn’t necessitate a weak or incomplete presentation.
I and many other smart people want to see the solid version of the argument, without the gaping holes which are excusable in popular work and rants but inexcusable in rational discourse. This page does not exist! You want a moratorium, tell us exactly why we should agree! Having a solid argument is what ultimately matters in intellectual progress. Everything else is window dressing. If you have a solid argument, great! Please show it to me.
My guess is that on the margin more time should be spent improving the core messaging versus saturating the dialogue tree, on many AI questions, if you combine effort across everyone.
We cannot offer anything to the ASI, so it will have no reasons to keep us around aside from ethical.
Nor can we ensure that an ASI who decided to commit genocide will fail to do it.
We don’tknow a way to create the ASI andinfuse an ethics into it. SOTA alignment methods have major problems, which are best illustrated by sycophancy and LLMs supporting clearly delirious users.[1]OpenAI’s Model Spec explicitly prohibited[2] sycophancy, and one of Claude’s Commandments is “Choose the response that is least intended to build a relationship with the user.” And yet it didn’t prevent LLMs from becoming sycophantic. Apparently, the only known non-sycophantic model is KimiK2.
KimiK2 is a Chinese model created by a new team. And the team is the only one who guessed that one should rely on RLVR and self-critique instead of bias-inducing RLHF. We can’t exclude the possibility that Kimi’s success is more due to luck than to actual thinking about sycophancy and RLHF.
Strictly speaking, Claude Sonnet 4, which was red-teamed in Tim Hua’s experiment, is second to best at pushing back after KimiK2. Tim remarks that Claude sucks at the Spiral Bench because the personas in Tim’s experiment, unlike the Spiral Bench, are supposed to be under stress.
Strictly speaking, it is done as a User-level instruction, which arguably means that it can be overridden at the user’s request. But GPT-4o was overly sycophantic without users instructing it to do so.
Do you think maybe rationalists are spending too much effort attempting to saturate the dialogue tree (probably not effective at winning people over) versus improving the presentation of the core argument for an AI moratorium?
Smart people don’t want to see the 1000th response on whether AI actually could kill everyone. At this point we’re convinced. Admittedly, not literally all of us, but those of us who are not yet convinced are not going to become suddenly enlightened by Yudkowsky’s x.com response to some particularly moronic variation of an objection he already responded to 20 years ago (Why does he do this? does he think has any kind of positive impact?)
A much better use of time would be to work on an article which presents the solid version of the argument for an AI moratorium. I.e., not an introductory text or article in Time Magazine, and not an article targeted to people he clearly thinks are just extremely stupid relative to him so rants for 10,000 words trying to drive home a relatively simple point. But rather an argument in a format that doesn’t necessitate a weak or incomplete presentation.
I and many other smart people want to see the solid version of the argument, without the gaping holes which are excusable in popular work and rants but inexcusable in rational discourse. This page does not exist! You want a moratorium, tell us exactly why we should agree! Having a solid argument is what ultimately matters in intellectual progress. Everything else is window dressing. If you have a solid argument, great! Please show it to me.
My guess is that on the margin more time should be spent improving the core messaging versus saturating the dialogue tree, on many AI questions, if you combine effort across everyone.
We cannot offer anything to the ASI, so it will have no reasons to keep us around aside from ethical.
Nor can we ensure that an ASI who decided to commit genocide will fail to do it.
We don’t know a way to create the ASI and infuse an ethics into it. SOTA alignment methods have major problems, which are best illustrated by sycophancy and LLMs supporting clearly delirious users.[1] OpenAI’s Model Spec explicitly prohibited[2] sycophancy, and one of Claude’s Commandments is “Choose the response that is least intended to build a relationship with the user.” And yet it didn’t prevent LLMs from becoming sycophantic. Apparently, the only known non-sycophantic model is KimiK2.
KimiK2 is a Chinese model created by a new team. And the team is the only one who guessed that one should rely on RLVR and self-critique instead of bias-inducing RLHF. We can’t exclude the possibility that Kimi’s success is more due to luck than to actual thinking about sycophancy and RLHF.
Strictly speaking, Claude Sonnet 4, which was red-teamed in Tim Hua’s experiment, is second to best at pushing back after KimiK2. Tim remarks that Claude sucks at the Spiral Bench because the personas in Tim’s experiment, unlike the Spiral Bench, are supposed to be under stress.
Strictly speaking, it is done as a User-level instruction, which arguably means that it can be overridden at the user’s request. But GPT-4o was overly sycophantic without users instructing it to do so.