My name is Mikhail Samin (@Mihonarium on Twitter/X, @misha on Telegram).
Humanity’s future can be enormous and awesome; losing it would mean our lightcone (and maybe the universe) losing most of its potential value.
I have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.
I’m running two small nonprofits: AI Governance and Safety Institute and AI Safety and Governance Fund. Learn more about our results and donate: aisgf.us/fundraising
I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).
In the past, I’ve launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.
[Less important: I’ve also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the “Vesna” democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny’s Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn’t achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it’s likely the Russian authorities would imprison me if I ever visit Russia.]
Sorry, answering quickly with mostly cached thoughts without engaging deeply:
Current LLM architectures can probably do everything. There are ways of compiling code into transformer’s weights. they’re not good at everything, yet, and generally end up kinda spiky; but it’d be surprising if you couldn’t just scale them up and do a lot more RL and get something pretty general.
While the 0days discovered by Mythos are all of the kind that could be discovered by humans, humans have in fact looked at some of the code, a lot, manually and using instruments, and failed to find this stuff. I don’t dispute these are not on a level above best human cybersecurity researchers (and have a market on whether there will be any such vulnerabilities discovered by AI that couldn’t have been discovered by humans at all, in the next few years). Being just as good as best humans but much faster is sufficient to take over the world.
SSI has zero customers, they still get billions of dollars. Google has a lot of money and compute and use AI to develop better chips/compute infra. There are lots of efficiency gains that stack, and some of those are getting automated. I can imagine three main AI companies running out of money, but find that unlikely. The promise of higher intelligence is worth a lot: even if it’s expensive to get to it and to use it, it’s cheaper than paying humans to perform this same tasks, and due to that the demand is enormous.
I’d not want to bet on AI companies forever being unable to solve jailbreaks sir caring sufficiently about those to not release models at all/ever. In the limit, you do a huge ton of classifiers of texts and activations and report users who try to do bad things to authorities, monitor wetlabs/have honeypot wetlabs that LLMs recommend, make LLMs unaware of some knowledge, etc.; if this really prevents AI companies from releasing and earning tens+ of billions of dollars, they will throw billions of dollars at solving this and will solve this successfully. (E.g., imagine anyone who submits a working jailbreak gets $10k, up to a million submissions each of which gets patches together with similar things; do you think it even gets to a million submissions?)
”Alignment“ of current models to not teach people to kill a lot of people mostly has nothing to do with the difficulties we’d face at the superintelligent level.
What is a thing a transformer architecture cannot do in principle? Like, are you imagining that if we figure out how to make literal superintelligences, something would prevent us from compiling them into transformers? Given in-context learning and scratchpad maintenance are allowed etc.