Søren Elverlin

Karma: 417

Søren Elverlin 15 Feb 2023 6:45 UTC
44 points
43
on: My understanding of Anthropic strategy
Why do you think Anthropic is not replying to MIRI’s challenge?

Søren Elverlin 2 Dec 2023 19:12 UTC
28 points
0
on: 2023 Unofficial LessWrong Census/Survey
I took the survey, and enjoyed it. There was a suggestion to also fill out the Rationalist Organizer Census, 2023. I can’t remember if I have already filled it out, or I’m mixing it together with the 2022 Census. Is it new?

Søren Elverlin 21 Aug 2022 12:02 UTC
16 points
15
on: What’s the Least Impressive Thing GPT-4 Won’t be Able to Do
Intelligence Amplification

GPT-4 will be unable to contribute to the core cognitive tasks involved in AI programming.
- If you ask GPT-4 to generate ideas for how to improve itself, it will always (10/10) suggest things that an AI researcher considers very unlikely to work.
- If you ask GPT-4 to evaluate ideas for improvement that are generated by an AI researcher, the feedback will be of no practical use.
- Likewise, every suggestion for how to get more data or compute, or be more efficient with data or compute, will be judged by an AI researcher as hopeless.
- If you ask GPT-4 to suggest performance improvements to the core part of its own code, every single of these will be very weak at best.
- If there is an accompanying paper for GPT-4, there will be no prompts possible that would make GPT-4 suggest meaningful improvements to this paper.
I assign 95% to each of these statements. I expect we will not be seeing the start of a textbook takeoff in August.

Søren Elverlin 6 Mar 2023 17:57 UTC
14 points
4
on: Introducing Leap Labs, an AI interpretability startup
The research ethos seems like it could easily be used to justify research that appears to be safety-oriented, but actually advances capabilities.

Have you considered how your interpretability tool can be used to increase capability?

What processes are in place to ensure that you are not making the problem worse?

Søren Elverlin 31 Jan 2023 12:58 UTC
14 points
4
on: Peter Thiel’s speech at Oxford Debating Union on technological stagnation, Nuclear weapons, COVID, Environment, Alignment, ‘anti-anti anti-anti-classical liberalism’, Bostrom, LW, etc.
- AI Risk is mentioned first at 19:40.
- Bostrom’s “The Vulnerable World Hypothesis” paper is grossly misquoted.
- No object-level arguments against AI Risk are presented, nor are there any reference to object-level arguments made by anyone.
I’m still upvoting the post, because I find it useful to know how AI Risk (and we) are perceived.

Søren Elverlin 21 May 2021 13:22 UTC
13 points
on: Søren Elverlin’s Shortform
I made my most strident and impolite presentation yet in the AISafety.com Reading Group last night. We were discussing “Conversation with Ernie Davis”, and I attacked this part:

“And once an AI has common sense it will realize that there’s no point in turning the world into paperclips...”

I described this as fundamentally mistaken and like an argument you’d hear from a person that had not read “Superintelligence”. This is ad hominem, and it pains me. However, I feel like the emperor has no clothes, and calling it out explicitly is important.

Søren Elverlin 10 Aug 2018 11:12 UTC
13 points
on: AI Reading Group Thoughts (1/?): The Mandate of Heaven
Good luck with the in-person AI Safety reading group. It sounds productive and fun.
For the past 2 years, I have been running the Skype-based AISafety.com Reading Group. You can see the material we have covered at https://aisafety.com/reading-group/ . Yesterday, Vadim Kosoy from MIRI gave a great presentation of his Learning-Theoretic Agenda: https://youtu.be/6MkmeADXcZg
Usually, I try to post a summary of the discussion to our Facebook group, but I’ve been unable to get a follow-on discussion going. Your summary/idea above is higher quality than what I post.
Please tell me if you have any ideas for collaboration between our reading groups, or if I can do anything else to help you :).

Søren Elverlin 7 Oct 2022 6:57 UTC
10 points
7
on: What does it mean for an AGI to be ‘safe’?
I prefer “AI Safety” over “AI Alignment” because I associate the first more with Corrigibility, and the second more with Value-alignment.

It is the term “Safe AI” that implies 0% risk, while “AI Safety” seems more similar to “Aircraft Safety” in acknowledging a non-zero risk.

Søren Elverlin 28 Dec 2023 17:17 UTC
9 points
0
on: Critical review of Christiano’s disagreements with Yudkowsky
On this subject, here is my 2 hours long presentation (in 3 parts), going over just about every paragraph in Paul Christiano’s “Where I agree and disagree with Eliezer”:

https://youtu.be/V8R0s8tesM0?si=qrSJP3V_WnoBptkL

https://youtu.be/a2qTNuD1Sn8?si=YHyCr8AC0HkEnN4J

https://youtu.be/8XWbPDvKgM0?si=SvLfL4bhHDO6zDBu

Søren Elverlin 5 Apr 2022 13:13 UTC
9 points
on: Google’s new 540 billion parameter language model
According to this image, the performance is generally above the human average:

In the Paul-verse, we should expect that economic interests would quickly cause such models to be used for everything that they can be profitably used for. With better-than-average-human performance, that may well be a doubling of global GDP.

In the Eliezer-verse, the impact of such models on the GDP of the world will remain around $0, due to practical and regulatory constraints, right up until the upper line (“Human (Best)”) is surpassed for 1 particular task.

Søren Elverlin 8 Mar 2023 16:32 UTC
8 points
2
in reply to: Ivy Mazzola’s comment on: Article about abuse in LessWrong and rationalist communities in Bloomberg News
I strongly support your efforts to improve the EA Forum, and I can see your point that using upvotes as a proxy for appropriateness fails when there is a deliberate effort to push the forum in a better direction.

Søren Elverlin 10 Jun 2022 9:20 UTC
8 points
on: [linkpost] The final AI benchmark: BIG-bench
The inclusion criteria states:

Tasks that are completely beyond the capabilities of current language models are also encouraged

It’s easy to come up with a benchmark that requires a high but unspecified level of intelligence. An extreme example would be to ask for a proof that P!=NP—we have no idea about the difficulty of the task, though we suspect that it requires superintelligence. To be valuable, the challenge of a benchmark needs to possible to relate to meaningful capabilities, such as “The Human Level”.

Most people couldn’t answer questions about cryobiology in Spanish, even though they possess general intelligence. This benchmark seems to consist of random tasks around and above the human level, and I fear progress on this benchmark might be poorly correlated with progress towards AGI.

Søren Elverlin 29 Apr 2021 11:17 UTC
8 points
on: AMA: Paul Christiano, alignment researcher
In the interview with AI Impacts, you said:

...examples of things that I’m optimistic about that they [people at MIRI] are super pessimistic about are like, stuff that looks more like verification...

Are you still optimistic? What do you consider the most promising recent work?

Søren Elverlin 2 Feb 2021 10:08 UTC
8 points
on: Søren Elverlin’s Shortform
Today, I bought 20 shares in Gamestop / GME. I expect to lose money, and bought them as a hard-to-fake signal about willingness to coordinate and cooperate in the game-theoretic sense. This was inspired by Eliezer Yudkowsky’s post here: https://yudkowsky.medium.com/

In theory, Moloch should take all the ressources of someone following this strategy. In practice, Eru looks after her own, so I have the money to spare.

Søren Elverlin 31 Jan 2024 19:07 UTC
7 points
0
on: Without Fundamental Advances, Rebellion and Coup d’État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
This post clearly spoofs Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI, though it changes “default” to “inevitable”.

I think that coup d’États and rebellions are nearly common enough that they could be called the default, though they are certainly not inevitable.

I enjoyed this post. Upvoted.

Søren Elverlin 9 Jan 2020 13:12 UTC
7 points
on: Q & A with Stuart Russell in AISafety.com Reading Group
Recording of the session:
https://youtu.be/BztgYBqXi0Q

Søren Elverlin 8 Mar 2023 10:14 UTC
6 points
4
in reply to: Ivy Mazzola’s comment on: Article about abuse in LessWrong and rationalist communities in Bloomberg News
It was crossposted after I commented, and did find a better reception on EA Forum.

I did not mean my comment to imply that the community here does not need to be less wrong. However, I do think that that there is a difference between what is appropriate to post here and what is appropriate to post on the EA Forum.

I reject a norm that I ought to be epistemically brave and criticise the piece in any detail. It is totally appropriate to just downvote bad posts and move on. Writing a helpful meta-comment to the poster is a non-obligatory prosocial action.

Søren Elverlin 28 Nov 2022 13:01 UTC
6 points
1
on: What videos should Rational Animations make?
I would be excited to see Rational Animations try to cover the Hard Problem of Corrigibility: https://arbital.com/p/hard_corrigibility/

I believe that this would be the optimal video to create for the optimization target “reduce probability of AI-Doom”. It seems (barely) plausible that someone really smart could watch the video, make a connection to some obscure subject none of us know about, and then produce a really impactful contribution to solving AI Alignment.

Søren Elverlin 14 Oct 2022 19:47 UTC
5 points
0
on: Counterarguments to the basic AI x-risk case

However if we think that utility maximization is difficult to wield without great destruction, then that suggests a disincentive to creating systems with behavior closer to utility-maximization. Not just from the world being destroyed, but from the same dynamic causing more minor divergences from expectations, if the user can’t specify their own utility function well.

A strategically aware utility maximizer would try to figure out what your expectations are, satisfy them while preparing a take-over, and strike decisively without warning. We should not expect to see an intermediate level of “great destruction”.

Søren Elverlin 26 Nov 2021 14:42 UTC
5 points
in reply to: paulfchristiano’s comment on: Christiano, Cotra, and Yudkowsky on AI progress
How long time do you see between “1 AI clearly on track to Foom” and “First AI to actually Foom”? My weak guess is Eliezer would say “Probably quite little time”, but your model of the world requires the GWP to double over a 4 year period, and I’m guessing that period probably starts later than 2026.

I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.