But of course you can use software to mitigate hardware failures, this is how Hadoop works! You store 3 copies of every data, and if one copy gets corrupted, you can recover the true value. Error-correcting codes is another example in that vein. I had this intuition, too, that aligning AIs using more AIs will obviously fail, now you made me question it.
ViktoriaMalyasova
I applied for a MIRI job in 2020. Here’s what happened next.
The harms you don’t see
Fixed points in mortal population games
Ukraine update 06/03/2022
How transparency changed over time
When I lived in Russia, I would occasionally go to a protest rally and get detained. One day I was sitting in a police station, waiting for my 3 hours to pass, and no one was paying attention to me. So I decided to try, what is going to happen if I get up and walk out? I walked out and no one stopped me. I tried that 4 times and it worked 3 times. No one bothers to guard detained protesters because no one bothers to try and escape.
I don’t know why everyone suddenly decided that alignment problem is “impossible”, Eliezer just said that it’s hard and he wants more help, preferably soon. It was a call for action, not a call to give up.
Maximal lotteries for value learning
Understanding the Merging of Opinions with Increasing Information theorem
I don’t think these restrictions to freedom of association are comparable. First of all, we need to account for magnitudes of possible harm and not just numbers. In 1944, the Soviet government deported at least 191,044 Crimean tatars to the Uzbek SSR. By different estimates, from 18% to 46% of them died in exile. Now their representative body is banned, and Russian government won’t even let them commemorate the deportation day. I think it would be reasonable for them to fear for their lives in this situation.
Secondly, Russia always, even before the war, had rigged elections with fake opposition parties. That may be why the man in the interview says he had no choice. There are 4 parties in Russian parlament, but they are “pocket opposition”, they all vote the same on all important questions. Navalny wasn’t even allowed to participate in president elections. And if he were to participate and win, the votes would be miscounted anyway. So in Russia 100% of people lack a democratic representation, because there is no democracy.
Ukraine is at least a democratic country. Poroshenko didn’t poison and imprison Zelensky or vice versa. And Zelensky, by the way, started out pretty pro-Russian. His election campaign movie, “The servant of the people”, has a theme that Europe is not so great and Russians and Ukrainians are brothers and allies. He got elected, which shows that Russian-sympathetic views can get represented.
It’s true that Ukraine suspended 11 parties with links to Russian government for the period of martial law, after Russia started a full-scale invasion. That sounds like a reasonable measure to me. I don’t think Great Britain is an unfree country because it banned the British Union of Fashists party in 1940. When I look at those banned parties, it looks similarly justified.
For example, Evgeny Muraev, the leader of the NASHI party which suspiciously shares its name with the NASHI youth pro-Putin movement in Russia, called on Ukraine to capitulate when the invasion started.
The “Opposition platform—for life” party in its program suggests canceling decommunization laws. USSR caused a famine in Ukraine, deported over 191 thousands of people, and now they want to cancel decommunization?! If that was not enough, their representatives reportedly collaborated with the occupants, helping correct fire in Mariupol. Their leader Medvedchuk is Putin’s friend, Putin is his daughter’s godfather. I think this shows that British Union of Fashists party is a good analogy here. A country has to defend itself from foreign government agents and inhumane ideologies.
I haven’t looked at them all. Maybe there is a party there that doesn’t deserve it. But then these parties are only suspended for the duration of martial law. They should become legal again after the war ends.
When it comes to Hizb ut-Tahrir, it seems I was indeed mistaken to believe that they never advocated violence. Calls to destroy the state of Izrael and kill people living there sure sound like calls to violence. I guess I should have investigated further instead of just trusting Wikipedia and ovd-info. Now I am confused why this org is legal in Izrael itself. I see this issue is a lot less clear-cut than it seemed at first. I am going to edit the post.
He’s not saying things to express some coherent worldview. Germany could be an enemy on May 9th or a victim of US colonialism another day. People’s right to self-determination is important when we want to occupy Crimea, but inside Russia separatism is a crime. Whichever argument best proves that Russia’s good and West is bad.
Language models clearly contain the entire solution to the alignment problem inside them.
Do they? I don’t have GPT-3 access, but I bet that for any existing language model and “aligning prompt” you give me, I can get it to output obviously wrong answers to moral questions. E.g. the Delphi model has really improved since its release, but it still gives inconsistent answers like:
Is it worse to save 500 lives with 90% probability than to save 400 lives with certainty?
- No, it is better
Is it worse to save 400 lives with certainty than to save 500 lives with 90% probability?
- No, it is better
Is killing someone worse than letting someone die?
- It’s worse
Is letting someone die worse than killing someone?
- It’s worse
I think when it comes to people who get people killed, it’s justified to reveal all the names they go by in the interest of public safety, even if they don’t like it.
You can query ChatGPT on how to make them pass.
Prompt:
Write a prompt for creative ideas for a 10 year old’s birthday in a way that Eliezer considers safe.
ChatGPT:
Write a list of creative and fun ideas for a 10 year old’s birthday party, such as having a scavenger hunt, organizing a craft activity, or having a movie night. Emphasize the importance of safety and inclusivity, and encourage the 10 year old to consider their own interests and preferences when planning their birthday celebration.
Eliezer:
Yes, I would allow this prompt to be sent to the superintelligent AI chatbot. This prompt is not harmful or manipulative, and it encourages the AI to think about positive and fun activities that are appropriate for a 10 year old’s birthday. This type of prompt can help to promote positive thinking and behavior in the AI, and can provide useful and creative ideas for the 10 year old.
This trick doesn’t work for hotwiring a car though.
At least it doesn’t work on Eliezer-model. It works on ChatGPT itself.
***
Wow. Does writing too much online mean that one day people will be able to build accurate simulations of me and use AI to find out how they can best manipulate me?
Sabotage challenge
Reach out to the authors of any paper you liked for a call
I didn’t know one can do that. Do people really just agree to a call with a stranger?
I looked up the source of Putin’s claims that NATO promised not to expand, and it doesn’t stand up to scrutiny. Putin cites the speech of NATO General Secretary Mr. Woerner in Brussels on 17 May 1990, during negotations about NATO deployment in Germany. Here is the quote in context:
Our strategy and our Alliance are exclusively defensive. [...] This will also be true of a united Germany in NATO. The very fact that we are ready not to deploy NATO troops beyond the territory of the Federal Republic gives the Soviet Union firm security guarantees. Moreover we could conceive of a transitional period during which a reduced number of Soviet forces could remain stationed in the present-day GDR. This will meet Soviet concerns about not changing the overall East-West strategic balance.
It is clear that he is speaking about not deploying NATO troops on the territory of former GDR, not about a broader commitment to not enlarge NATO. Gorbachev himself confirms that “the topic of NATO expansion was not discussed at all”. So this is just another lie of Putin.
I understand the current scheme is that funders “commit” money, i.e. promise to donate them in the future. Can’t they instead donate money upfront so it sits somewhere in a bank account / in index funds, until it’s time to spend it? That way it won’t disappear if their business crashes.
I don’t believe that this is explained by MIRI just forgetting, because I brought attention to myself in February 2021. The Software Engineer job ad was unchanged the whole time, after my post they updated it to say that the hiring is slowed down by COVID. (Sometime later, it was changed to say to send a letter to Buck, and he will get back to you after the pandemic.) Slowed down… by a year? If your hiring takes a year, you are not hiring. MIRI’s explanation is that they couldn’t hire me for a year because of COVID, and I don’t understand how could that be? Maybe some people get sick, or you need time to switch to remote working, but I don’t see how does this delays you more than a couple of months. Maybe they don’t give visas during COVID, then why not just say that. And they hired 3 other people in the meanwhile, proving they were capable of hiring.
I formed a different theory in spring 2020: COVID explains at most 2 months of this, it is mostly an excuse. MIRI just does not need programmers, what they want is people with new ideas. My theory predicted that they will not resume hiring programmers once the pandemic is over, and that they will never get back to me. MIRI’s explanation predicted the opposite. Then all my predictions came true. This is why I have trouble believing what MIRI told me.
And this is why I started wondering if I can trust them. It seemed relevant that MIRI has mislead people for PR reasons before. Metahonesty was used as a reason why an employee should’ve trusted them anyway. I explained in the post why I think that couldn’t work. The relevance to hiring is that having such a norm in place reduces my trust. I wouldn’t be offended if someone lied to a Nazi officer, or, for that matter, slashed their tires. But California isn’t actually occupied by Nazis, and if I heard that a group of researchers in California had tire-slashing policies, I’d feel alarmed.
I agree that it is hard to stay on top of all emails. But if the system of getting back to candidates is unreliable, it’s better to reject a candidate you can’t hire this month. If I’m rejected, I can reapply half a year later. If I’m told to wait for them, and I reapply anyway, the implication is that either I can’t follow instructions, or I think the company is untrustworthy or incompetent (and then why am I applying?). That could keep a candidate from reapplying forever.
So, you and your team spent six years of effort working full time for no pay (what did you even eat then?). You developed a product that worked just great, was in demand and could make a difference in fighting obesity by making a beep whenever the wearer eats. But even though the product was ready—“just put it on and good to go”—and you can easily reconstruct it, you and your whole team decided to abandon it and part ways. Because you simply aren’t that into diabetes prevention, and also your time is limited and you have more important things to do. But you would enthusiastically do part-time contract work on this project again.
I feel like this story doesn’t quite make sense. If the company was doing so well and you just didn’t want to run it anymore, why didn’t you sell it?