Søren Elverlin

Karma: 417

Retrospective: Lessons from the Failed Alignment Startup AISafety.com

Søren Elverlin12 May 2023 18:07 UTC

104 points

9 comments3 min readLW link

Søren Elverlin 15 Feb 2023 6:45 UTC
44 points
43
on: My understanding of Anthropic strategy
Why do you think Anthropic is not replying to MIRI’s challenge?

Søren Elverlin 2 Dec 2023 19:12 UTC
28 points
0
on: 2023 Unofficial LessWrong Census/Survey
I took the survey, and enjoyed it. There was a suggestion to also fill out the Rationalist Organizer Census, 2023. I can’t remember if I have already filled it out, or I’m mixing it together with the 2022 Census. Is it new?

A long reply to Ben Garfinkel on Scrutinizing Classic AI Risk Arguments

Søren Elverlin27 Sep 2020 17:51 UTC

17 points

6 comments1 min readLW link

AI Safety Reading Group

Søren Elverlin11 Aug 2019 9:01 UTC

16 points

8 comments1 min readLW link

Søren Elverlin 21 Aug 2022 12:02 UTC
16 points
15
on: What’s the Least Impressive Thing GPT-4 Won’t be Able to Do
Intelligence Amplification

GPT-4 will be unable to contribute to the core cognitive tasks involved in AI programming.
- If you ask GPT-4 to generate ideas for how to improve itself, it will always (10/10) suggest things that an AI researcher considers very unlikely to work.
- If you ask GPT-4 to evaluate ideas for improvement that are generated by an AI researcher, the feedback will be of no practical use.
- Likewise, every suggestion for how to get more data or compute, or be more efficient with data or compute, will be judged by an AI researcher as hopeless.
- If you ask GPT-4 to suggest performance improvements to the core part of its own code, every single of these will be very weak at best.
- If there is an accompanying paper for GPT-4, there will be no prompts possible that would make GPT-4 suggest meaningful improvements to this paper.
I assign 95% to each of these statements. I expect we will not be seeing the start of a textbook takeoff in August.

Søren Elverlin 6 Mar 2023 17:57 UTC
14 points
4
on: Introducing Leap Labs, an AI interpretability startup
The research ethos seems like it could easily be used to justify research that appears to be safety-oriented, but actually advances capabilities.

Have you considered how your interpretability tool can be used to increase capability?

What processes are in place to ensure that you are not making the problem worse?

Søren Elverlin 31 Jan 2023 12:58 UTC
14 points
4
on: Peter Thiel’s speech at Oxford Debating Union on technological stagnation, Nuclear weapons, COVID, Environment, Alignment, ‘anti-anti anti-anti-classical liberalism’, Bostrom, LW, etc.
- AI Risk is mentioned first at 19:40.
- Bostrom’s “The Vulnerable World Hypothesis” paper is grossly misquoted.
- No object-level arguments against AI Risk are presented, nor are there any reference to object-level arguments made by anyone.
I’m still upvoting the post, because I find it useful to know how AI Risk (and we) are perceived.

Søren Elverlin 21 May 2021 13:22 UTC
13 points
on: Søren Elverlin’s Shortform
I made my most strident and impolite presentation yet in the AISafety.com Reading Group last night. We were discussing “Conversation with Ernie Davis”, and I attacked this part:

“And once an AI has common sense it will realize that there’s no point in turning the world into paperclips...”

I described this as fundamentally mistaken and like an argument you’d hear from a person that had not read “Superintelligence”. This is ad hominem, and it pains me. However, I feel like the emperor has no clothes, and calling it out explicitly is important.

Søren Elverlin 10 Aug 2018 11:12 UTC
13 points
on: AI Reading Group Thoughts (1/?): The Mandate of Heaven
Good luck with the in-person AI Safety reading group. It sounds productive and fun.
For the past 2 years, I have been running the Skype-based AISafety.com Reading Group. You can see the material we have covered at https://aisafety.com/reading-group/ . Yesterday, Vadim Kosoy from MIRI gave a great presentation of his Learning-Theoretic Agenda: https://youtu.be/6MkmeADXcZg
Usually, I try to post a summary of the discussion to our Facebook group, but I’ve been unable to get a follow-on discussion going. Your summary/idea above is higher quality than what I post.
Please tell me if you have any ideas for collaboration between our reading groups, or if I can do anything else to help you :).

Søren Elverlin 7 Oct 2022 6:57 UTC
10 points
7
on: What does it mean for an AGI to be ‘safe’?
I prefer “AI Safety” over “AI Alignment” because I associate the first more with Corrigibility, and the second more with Value-alignment.

It is the term “Safe AI” that implies 0% risk, while “AI Safety” seems more similar to “Aircraft Safety” in acknowledging a non-zero risk.

OpenAI’s Alignment Plan is not S.M.A.R.T.

Søren Elverlin18 Jan 2023 6:39 UTC

9 points

19 comments4 min readLW link

Søren Elverlin 28 Dec 2023 17:17 UTC
9 points
0
on: Critical review of Christiano’s disagreements with Yudkowsky
On this subject, here is my 2 hours long presentation (in 3 parts), going over just about every paragraph in Paul Christiano’s “Where I agree and disagree with Eliezer”:

https://youtu.be/V8R0s8tesM0?si=qrSJP3V_WnoBptkL

https://youtu.be/a2qTNuD1Sn8?si=YHyCr8AC0HkEnN4J

https://youtu.be/8XWbPDvKgM0?si=SvLfL4bhHDO6zDBu

Søren Elverlin 5 Apr 2022 13:13 UTC
9 points
on: Google’s new 540 billion parameter language model
According to this image, the performance is generally above the human average:

In the Paul-verse, we should expect that economic interests would quickly cause such models to be used for everything that they can be profitably used for. With better-than-average-human performance, that may well be a doubling of global GDP.

In the Eliezer-verse, the impact of such models on the GDP of the world will remain around $0, due to practical and regulatory constraints, right up until the upper line (“Human (Best)”) is surpassed for 1 particular task.

Søren Elverlin 8 Mar 2023 16:32 UTC
8 points
2
in reply to: Ivy Mazzola’s comment on: Article about abuse in LessWrong and rationalist communities in Bloomberg News
I strongly support your efforts to improve the EA Forum, and I can see your point that using upvotes as a proxy for appropriateness fails when there is a deliberate effort to push the forum in a better direction.

Søren Elverlin 10 Jun 2022 9:20 UTC
8 points
on: [linkpost] The final AI benchmark: BIG-bench
The inclusion criteria states:

Tasks that are completely beyond the capabilities of current language models are also encouraged

It’s easy to come up with a benchmark that requires a high but unspecified level of intelligence. An extreme example would be to ask for a proof that P!=NP—we have no idea about the difficulty of the task, though we suspect that it requires superintelligence. To be valuable, the challenge of a benchmark needs to possible to relate to meaningful capabilities, such as “The Human Level”.

Most people couldn’t answer questions about cryobiology in Spanish, even though they possess general intelligence. This benchmark seems to consist of random tasks around and above the human level, and I fear progress on this benchmark might be poorly correlated with progress towards AGI.

Søren Elverlin 29 Apr 2021 11:17 UTC
8 points
on: AMA: Paul Christiano, alignment researcher
In the interview with AI Impacts, you said:

...examples of things that I’m optimistic about that they [people at MIRI] are super pessimistic about are like, stuff that looks more like verification...

Are you still optimistic? What do you consider the most promising recent work?

Søren Elverlin 2 Feb 2021 10:08 UTC
8 points
on: Søren Elverlin’s Shortform
Today, I bought 20 shares in Gamestop / GME. I expect to lose money, and bought them as a hard-to-fake signal about willingness to coordinate and cooperate in the game-theoretic sense. This was inspired by Eliezer Yudkowsky’s post here: https://yudkowsky.medium.com/

In theory, Moloch should take all the ressources of someone following this strategy. In practice, Eru looks after her own, so I have the money to spare.

[Question] Searching for post on Community Takeover

Søren Elverlin22 Mar 2022 9:42 UTC

7 points

11 comments1 min readLW link

Søren Elverlin 31 Jan 2024 19:07 UTC
7 points
0
on: Without Fundamental Advances, Rebellion and Coup d’État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
This post clearly spoofs Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI, though it changes “default” to “inevitable”.

I think that coup d’États and rebellions are nearly common enough that they could be called the default, though they are certainly not inevitable.

I enjoyed this post. Upvoted.

Søren Elverlin

Ret­ro­spec­tive: Les­sons from the Failed Align­ment Startup AISafety.com

A long re­ply to Ben Garfinkel on Scru­ti­niz­ing Clas­sic AI Risk Arguments

AI Safety Read­ing Group

OpenAI’s Align­ment Plan is not S.M.A.R.T.

[Question] Search­ing for post on Com­mu­nity Takeover

Retrospective: Lessons from the Failed Alignment Startup AISafety.com

A long reply to Ben Garfinkel on Scrutinizing Classic AI Risk Arguments

AI Safety Reading Group

OpenAI’s Alignment Plan is not S.M.A.R.T.

[Question] Searching for post on Community Takeover