I took the survey, and enjoyed it. There was a suggestion to also fill out the Rationalist Organizer Census, 2023. I can’t remember if I have already filled it out, or I’m mixing it together with the 2022 Census. Is it new?
Søren Elverlin
Retrospective: Lessons from the Failed Alignment Startup AISafety.com
A long reply to Ben Garfinkel on Scrutinizing Classic AI Risk Arguments
AI Safety Reading Group
Intelligence Amplification
GPT-4 will be unable to contribute to the core cognitive tasks involved in AI programming.
If you ask GPT-4 to generate ideas for how to improve itself, it will always (10/10) suggest things that an AI researcher considers very unlikely to work.
If you ask GPT-4 to evaluate ideas for improvement that are generated by an AI researcher, the feedback will be of no practical use.
Likewise, every suggestion for how to get more data or compute, or be more efficient with data or compute, will be judged by an AI researcher as hopeless.
If you ask GPT-4 to suggest performance improvements to the core part of its own code, every single of these will be very weak at best.
If there is an accompanying paper for GPT-4, there will be no prompts possible that would make GPT-4 suggest meaningful improvements to this paper.
I assign 95% to each of these statements. I expect we will not be seeing the start of a textbook takeoff in August.
The research ethos seems like it could easily be used to justify research that appears to be safety-oriented, but actually advances capabilities.
Have you considered how your interpretability tool can be used to increase capability?
What processes are in place to ensure that you are not making the problem worse?
AI Risk is mentioned first at 19:40.
Bostrom’s “The Vulnerable World Hypothesis” paper is grossly misquoted.
No object-level arguments against AI Risk are presented, nor are there any reference to object-level arguments made by anyone.
I’m still upvoting the post, because I find it useful to know how AI Risk (and we) are perceived.
I made my most strident and impolite presentation yet in the AISafety.com Reading Group last night. We were discussing “Conversation with Ernie Davis”, and I attacked this part:
“And once an AI has common sense it will realize that there’s no point in turning the world into paperclips...”
I described this as fundamentally mistaken and like an argument you’d hear from a person that had not read “Superintelligence”. This is ad hominem, and it pains me. However, I feel like the emperor has no clothes, and calling it out explicitly is important.
Good luck with the in-person AI Safety reading group. It sounds productive and fun.
For the past 2 years, I have been running the Skype-based AISafety.com Reading Group. You can see the material we have covered at https://aisafety.com/reading-group/ . Yesterday, Vadim Kosoy from MIRI gave a great presentation of his Learning-Theoretic Agenda: https://youtu.be/6MkmeADXcZg
Usually, I try to post a summary of the discussion to our Facebook group, but I’ve been unable to get a follow-on discussion going. Your summary/idea above is higher quality than what I post.
Please tell me if you have any ideas for collaboration between our reading groups, or if I can do anything else to help you :).
I prefer “AI Safety” over “AI Alignment” because I associate the first more with Corrigibility, and the second more with Value-alignment.
It is the term “Safe AI” that implies 0% risk, while “AI Safety” seems more similar to “Aircraft Safety” in acknowledging a non-zero risk.
OpenAI’s Alignment Plan is not S.M.A.R.T.
On this subject, here is my 2 hours long presentation (in 3 parts), going over just about every paragraph in Paul Christiano’s “Where I agree and disagree with Eliezer”:
https://youtu.be/V8R0s8tesM0?si=qrSJP3V_WnoBptkL
According to this image, the performance is generally above the human average:
In the Paul-verse, we should expect that economic interests would quickly cause such models to be used for everything that they can be profitably used for. With better-than-average-human performance, that may well be a doubling of global GDP.
In the Eliezer-verse, the impact of such models on the GDP of the world will remain around $0, due to practical and regulatory constraints, right up until the upper line (“Human (Best)”) is surpassed for 1 particular task.
I strongly support your efforts to improve the EA Forum, and I can see your point that using upvotes as a proxy for appropriateness fails when there is a deliberate effort to push the forum in a better direction.
The inclusion criteria states:
Tasks that are completely beyond the capabilities of current language models are also encouraged
It’s easy to come up with a benchmark that requires a high but unspecified level of intelligence. An extreme example would be to ask for a proof that P!=NP—we have no idea about the difficulty of the task, though we suspect that it requires superintelligence. To be valuable, the challenge of a benchmark needs to possible to relate to meaningful capabilities, such as “The Human Level”.
Most people couldn’t answer questions about cryobiology in Spanish, even though they possess general intelligence. This benchmark seems to consist of random tasks around and above the human level, and I fear progress on this benchmark might be poorly correlated with progress towards AGI.
In the interview with AI Impacts, you said:
...examples of things that I’m optimistic about that they [people at MIRI] are super pessimistic about are like, stuff that looks more like verification...
Are you still optimistic? What do you consider the most promising recent work?
Today, I bought 20 shares in Gamestop / GME. I expect to lose money, and bought them as a hard-to-fake signal about willingness to coordinate and cooperate in the game-theoretic sense. This was inspired by Eliezer Yudkowsky’s post here: https://yudkowsky.medium.com/
In theory, Moloch should take all the ressources of someone following this strategy. In practice, Eru looks after her own, so I have the money to spare.
[Question] Searching for post on Community Takeover
This post clearly spoofs Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI, though it changes “default” to “inevitable”.
I think that coup d’États and rebellions are nearly common enough that they could be called the default, though they are certainly not inevitable.
I enjoyed this post. Upvoted.
Why do you think Anthropic is not replying to MIRI’s challenge?