You can’t solve AI friendliness in a vacuum. To build a friendly AI, you have to simultaneously work on the AI and the code of ethics it should use, because they are interdependent. Until you know how the AI models reality most effectively you can’t know if your code of ethics uses atoms that make sense to the AI. You can try to always prioritize the ethics aspects and not make the AI any smarter until you have to do so, but you can’t first make sure that you have an infallible code of ethics and only start building the AI afterwards.
Florian_Dietz
The last time I saw someone suggest that one should build an AI without first solving friendliness completely, he was heavily downvoted. I found that excessive, which is why I posted this. I am positively surprised to see that I basically got no reaction with my statement. My memory must have been exaggerated with time, or maybe it was just a fluke.
edit: I now seriously doubt my previous statement. I just got downvoted on a thread in which I was explicitly instructed to post contrarian opinions and where the only things that should get downvotes are spam and trolling, which I didn’t do. Of course it’s also just possible that someone didnt read the OP and used normal voting rules.
I’m writing a novel about metafiction: Some of the characters are aware that they are fictional, or rather that they live within a simulation where the laws of physics seem to follow a narrative. Unlike other metafiction stories, however, this isn’t a comedy and the ontological and practical implications are treated seriously. Also, the main character is basically following timeless decision theory, but since it operates on very different timescales than humans, this has quite strange implications.
I find working on the background, the setting, characters and plot quite easy and captivating, but I hit a writer’s block whenever I want to transform my notes into complete chapters. This has reached the point where I have far more notes than actual story.
I use a program I wrote over the last couple of months to improve my productivity and enforce habits in myself via conditioning. Whenever I hear of an interesting productivity trick or a useful habit, I add it to the program. So far, I think it’s working, but there is so much overhead because of the sheer quantity of near-useless tricks that it will take some pruning before it actually becomes a strong net win.
There is a levelling system. Every minute of work gives one experience point, with a bonus if it was done with the pomodoro technique. The program also contains a Todo list, which I use for everything. In this list, there is a section on habits. This section is filled with repeating tasks. Each evening, I tick off all the habits I kept that day. For each habit I don’t tick off, I get a small experience drain the next morning. This encourages me to keep every habit, so that I can keep the daily experience drain to a minimum. Avoiding this negative reinforcement works very well as a motivator, and seeing the number for tomorrow’s experience drain go down whenever I tick off a task also serves as positive reinforcement as well.
I know, but writing is hard :-( Also, I have made it way too hard for myself. It’s easy to write notes about the personality of a completely non-human character, as long as you can intellectually understand its reasoning. But once I am forced to actually write its dialog, my head just hits a brick wall. The being is very intelligent and I want this to be rationalist fiction, so I have to think for a very long time just to find out in what exact way it would phrase its requests to maximize the probability of compliance. Writing the voices of the narrators/the administrator AIs of the simulation as they are slowly going insane is not easy, either.
Maybe I’m too perfectionist here. Do you think it’s better to write something trashy first and rewrite it later, or is it more efficient to do it right the first time?
Psychology
I took a few university courses, but ultimately I found it more efficient to just browse wikipedia for its lists of heuristics and biases. Then of course there is the book ‘Thinking Fast and Slow’, which is just great.
What other sources can you recommend?
Interesting post!
I have a feeling like there is a deep connection between this and the evaporative cooling effect (more moderate members of a group are more likely to leave when a group’s opinion gets too extreme, thereby increasing the ratio of extremists and making the group even more extreme). Like there ought to be a social theory that explains both effects. I can’t quite put my finger on it, though. Any ideas?
Thanks, these look really useful. I will definitely have a look at them.
Yes, it’s pretty similar. I think their idea of making the punishment affect a separate health bar rather than reducing the experience directly may actually be better. I should try that out some time. Unlike HabitRPG (I think?) my program is also a todo list, though. I use it for organizing my tasks and any task that I don’t finish in time costs experience, just like failing a habit. This helps to prevent procrastination.
I heard of NaNoWriMo before. Unfortunately that would be too much for me to handle. I am not a professional writer. I am just doing this in my free time and I just don’t have that kind of time, although I think this would definitely be worth checking out if it was during a holiday.
I’m not sure I understand what you mean. Implement what functionality where? I don’t think I’m going to start working for that company just because this feature is interesting :-) As for my own program, I changed it to use a health bar today, but that is of no use to anyone else, since the program is not designed to be easily usable by other people. I always find it terrible to consider that large companies have so many interdependencies that they take months to implement (and verify and test) what took an hour for my primitive program.
I know, and that is part of what makes this so hard. Thankfully, I have several ways too cheat:
-I can take days thinking of the perfect path of action for what takes seconds in the story.
-The character is a humanoid avatar of a very smart and powerful entity. While it was created with much specialized knowledge, it is still human-like at its core.
But most importantly:
-It’s a story about stories and there is an actual narrator-like entity changing the laws of nature. Sometimes, ‘because this would make for a better story’ is a perfectly valid criterion for choosing actions. The super-human characters are all aware of this and exploit it heavily.
While I think this is a good idea in principle, most of these slogans don’t seem very effective because they suffer from the illusion of transparency. Consider what they must look like to someone viewing this from the outside:
“AI must be friendly” just sounds weird to someone who isn’t used to the lingo of calling AI ‘friendly’. I can’t think of an alternative slogan for this, but there must be a better way to phrase that.
“Ebola must die!” sounds great. It references a concrete risk that people understand and calls for its destruction. I could get behind that.
But I’m afraid that all the other points just sound like something a doomsday cult would say. I know that there is solid evidence behind this, but the people you are trying to convince don’t have that knowledge. If I was unaware of the issues and just saw a few of these banners without knowing the context, I would not be surprised to find “Repent! The end is nigh!” somewhere nearby.
I would recommend that you think of some more slogans like the Ebola one: Mention a concrete risk that is understandable to the public and does not sound far-fetched to the uninformed.
LessWrong’s attitude towards AI research
I would argue that these two goals are identical. Unless humanity dies out first, someone is eventually going to build an AGI. It is likely that this first AI, if it is friendly, will then prevent the emergence of other AGI’s that are unfriendly.
Unless of course the plan is to delay the inevitable for as long as possible, but that seems very egoistic since faster computers make will make it easier to build an unfriendly AI in the future, while the difficulty of solving AGI friendliness will not be substantially reduced.
No, it can’t be done by brute-force alone, but faster hardware means faster feedback and that means more efficient research.
Also, once we have computers that are fast enough to just simulate a human brain, it becomes comparatively easy to hack an AI together by just simulating a human brain and seeing what happens when you change stuff. Besides the ethical concerns, this would also be insanely dangerous.
Yes, I was referring to LessWrong, not AI researchers in general.
I make it a habit to check LessWrong once a day for just a few minutes, along with a number of other websites.
There are so few new posts that I hardly even have to scroll to find were I left off the day before. Considering that the usual quality of the posts here is far above that on other websites, having more posts would definitely be a good thing. Even if it turned out that all the most important things are already being written about with no exception, the less interesting posts here are still interesting enough compared to other websites. This is especially true considering that one can just scan the forums and look only at posts above some threshold rating if one is in a hurry.