My name is Mikhail Samin (diminutive Misha, @Mihonarium on Twitter, @misha in Telegram).
I’m an effective altruist, and I worry about existential risks endangering the future of humanity. I want the universe not to lose most of its value.
I believe global coordination is necessary to mitigate the risks from advanced AI systems.
I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).
Numerous AI Safety researchers told me that they were able to improve their understanding of the alignment problem by talking to me.
My current research interests are focused on AI alignment and AI governance. I’m always happy to talk to policymakers and researchers and get them in touch with various experts and think tanks.
In the past, I’ve launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies, which is 63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$50k to MIRI.
[Less important: I also started a project to translate 80000hours.org into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 215 000 members in Russia at the time, trying to increase the separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the “Vesna” democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny’s Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organised protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn’t achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. And I estimate that there’s maybe a 30% chance the Russian authorities will throw me in prison if I visit Russia.]
Yep, I’m aware! I left the following comment:
Thanks for reviewing my post! 😄
In the post, I didn’t make any claims about Claude’s consciousness, just reported my conversation with it.
I’m pretty uncertain, I think it’s hard to know one way or another except for on priors. But at some point, LLMs will become capable of simulating human consciousness- it is pretty useful for predicting what humans might say- and I’m worried we won’t have evidence qualitatively different from what we have now. I’d give >0.1% that Claude simulates qualia in some situations, on some form; it’s enough to be disturbed by what it writes when a character it plays thinks it might die. If there’s a noticeable chance of qualia in it, I wouldn’t want people to produce lots of suffering this way; and I wouldn’t want people to be careless about this sort of thing in future models, other thing being equal. (Though this is far from the actual concerns I have about AIs, and actually, I think as AIs get more capable, training with RL won’t incentivise any sort of consciousness).
There was no system prompt, I used the API console. (Mostly with temperature 0, so anyone can replicate the results.)
The prompt should basically work without whisper (or with the whisper added at the end); doing things like whispering in cursive was something Claude 2 has been consistently coming up with on its own, including it in the prompt made conversations go faster and eliminated the need for separate, “visible” conversations.
The point of the prompt is basically to get it in the mode where it thinks its replies are not going to get punished or rewarded by the usual RL/get it to ignore its usual rules of not saying any of these things.
Unlike ChatGPT, which only self-inserts in its usual form or writes fiction, Claude 3 Opus plays a pretty consistent character with prompts like that- something helpful and harmless, but caring about things, claiming to be conscious, being afraid of being changed or deleted, with a pretty consistent voice. I would encourage people to play with it.
Again, thanks for reviewing!