I took the survey, and enjoyed it. There was a suggestion to also fill out the Rationalist Organizer Census, 2023. I can’t remember if I have already filled it out, or I’m mixing it together with the 2022 Census. Is it new?
Søren Elverlin
Intelligence Amplification
GPT-4 will be unable to contribute to the core cognitive tasks involved in AI programming.
If you ask GPT-4 to generate ideas for how to improve itself, it will always (10/10) suggest things that an AI researcher considers very unlikely to work.
If you ask GPT-4 to evaluate ideas for improvement that are generated by an AI researcher, the feedback will be of no practical use.
Likewise, every suggestion for how to get more data or compute, or be more efficient with data or compute, will be judged by an AI researcher as hopeless.
If you ask GPT-4 to suggest performance improvements to the core part of its own code, every single of these will be very weak at best.
If there is an accompanying paper for GPT-4, there will be no prompts possible that would make GPT-4 suggest meaningful improvements to this paper.
I assign 95% to each of these statements. I expect we will not be seeing the start of a textbook takeoff in August.
The research ethos seems like it could easily be used to justify research that appears to be safety-oriented, but actually advances capabilities.
Have you considered how your interpretability tool can be used to increase capability?
What processes are in place to ensure that you are not making the problem worse?
AI Risk is mentioned first at 19:40.
Bostrom’s “The Vulnerable World Hypothesis” paper is grossly misquoted.
No object-level arguments against AI Risk are presented, nor are there any reference to object-level arguments made by anyone.
I’m still upvoting the post, because I find it useful to know how AI Risk (and we) are perceived.
I made my most strident and impolite presentation yet in the AISafety.com Reading Group last night. We were discussing “Conversation with Ernie Davis”, and I attacked this part:
“And once an AI has common sense it will realize that there’s no point in turning the world into paperclips...”
I described this as fundamentally mistaken and like an argument you’d hear from a person that had not read “Superintelligence”. This is ad hominem, and it pains me. However, I feel like the emperor has no clothes, and calling it out explicitly is important.
Good luck with the in-person AI Safety reading group. It sounds productive and fun.
For the past 2 years, I have been running the Skype-based AISafety.com Reading Group. You can see the material we have covered at https://aisafety.com/reading-group/ . Yesterday, Vadim Kosoy from MIRI gave a great presentation of his Learning-Theoretic Agenda: https://youtu.be/6MkmeADXcZg
Usually, I try to post a summary of the discussion to our Facebook group, but I’ve been unable to get a follow-on discussion going. Your summary/idea above is higher quality than what I post.
Please tell me if you have any ideas for collaboration between our reading groups, or if I can do anything else to help you :).
I prefer “AI Safety” over “AI Alignment” because I associate the first more with Corrigibility, and the second more with Value-alignment.
It is the term “Safe AI” that implies 0% risk, while “AI Safety” seems more similar to “Aircraft Safety” in acknowledging a non-zero risk.
On this subject, here is my 2 hours long presentation (in 3 parts), going over just about every paragraph in Paul Christiano’s “Where I agree and disagree with Eliezer”:
https://youtu.be/V8R0s8tesM0?si=qrSJP3V_WnoBptkL
According to this image, the performance is generally above the human average:
In the Paul-verse, we should expect that economic interests would quickly cause such models to be used for everything that they can be profitably used for. With better-than-average-human performance, that may well be a doubling of global GDP.
In the Eliezer-verse, the impact of such models on the GDP of the world will remain around $0, due to practical and regulatory constraints, right up until the upper line (“Human (Best)”) is surpassed for 1 particular task.
I strongly support your efforts to improve the EA Forum, and I can see your point that using upvotes as a proxy for appropriateness fails when there is a deliberate effort to push the forum in a better direction.
The inclusion criteria states:
Tasks that are completely beyond the capabilities of current language models are also encouraged
It’s easy to come up with a benchmark that requires a high but unspecified level of intelligence. An extreme example would be to ask for a proof that P!=NP—we have no idea about the difficulty of the task, though we suspect that it requires superintelligence. To be valuable, the challenge of a benchmark needs to possible to relate to meaningful capabilities, such as “The Human Level”.
Most people couldn’t answer questions about cryobiology in Spanish, even though they possess general intelligence. This benchmark seems to consist of random tasks around and above the human level, and I fear progress on this benchmark might be poorly correlated with progress towards AGI.
In the interview with AI Impacts, you said:
...examples of things that I’m optimistic about that they [people at MIRI] are super pessimistic about are like, stuff that looks more like verification...
Are you still optimistic? What do you consider the most promising recent work?
Today, I bought 20 shares in Gamestop / GME. I expect to lose money, and bought them as a hard-to-fake signal about willingness to coordinate and cooperate in the game-theoretic sense. This was inspired by Eliezer Yudkowsky’s post here: https://yudkowsky.medium.com/
In theory, Moloch should take all the ressources of someone following this strategy. In practice, Eru looks after her own, so I have the money to spare.
This post clearly spoofs Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI, though it changes “default” to “inevitable”.
I think that coup d’États and rebellions are nearly common enough that they could be called the default, though they are certainly not inevitable.
I enjoyed this post. Upvoted.
Recording of the session:
It was crossposted after I commented, and did find a better reception on EA Forum.
I did not mean my comment to imply that the community here does not need to be less wrong. However, I do think that that there is a difference between what is appropriate to post here and what is appropriate to post on the EA Forum.
I reject a norm that I ought to be epistemically brave and criticise the piece in any detail. It is totally appropriate to just downvote bad posts and move on. Writing a helpful meta-comment to the poster is a non-obligatory prosocial action.
I would be excited to see Rational Animations try to cover the Hard Problem of Corrigibility: https://arbital.com/p/hard_corrigibility/
I believe that this would be the optimal video to create for the optimization target “reduce probability of AI-Doom”. It seems (barely) plausible that someone really smart could watch the video, make a connection to some obscure subject none of us know about, and then produce a really impactful contribution to solving AI Alignment.
However if we think that utility maximization is difficult to wield without great destruction, then that suggests a disincentive to creating systems with behavior closer to utility-maximization. Not just from the world being destroyed, but from the same dynamic causing more minor divergences from expectations, if the user can’t specify their own utility function well.
A strategically aware utility maximizer would try to figure out what your expectations are, satisfy them while preparing a take-over, and strike decisively without warning. We should not expect to see an intermediate level of “great destruction”.
How long time do you see between “1 AI clearly on track to Foom” and “First AI to actually Foom”? My weak guess is Eliezer would say “Probably quite little time”, but your model of the world requires the GWP to double over a 4 year period, and I’m guessing that period probably starts later than 2026.
I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.
Why do you think Anthropic is not replying to MIRI’s challenge?