Yes, typing mistakes in Turing Test is an example. It’s “artificially stupid” in the sense that you go from a perfect typing to a human imperfect typing.
I guess what you mean by “smart” is an AGI that would creatively make those typing mistakes to deceive humans into believing it is human, instead of some hardcoded feature in a Turing contest.
The points we tried to make in this article were the following:
To pass the Turing Test, build chatbots, etc., AI designers make the AI artificially stupid to feel human-like. This tendency will only get worse as we get to interact more with AIs. The pb is that to have sth really “human-like” necessits Superintelligence, not AGI.
However, we can use this concept of “Artificial Stupidity” to limit the AI in different ways and make it human-compatible (hardware, software, cognitive biases, etc.). We can use several of those sub-human AGIs to design safer AGIs (as you said), or test them in some kind of sandbox environment.
If I understand you correctly, every AGI lab would need to agree in not pushing the hardware limits too much, even though they would steel be incentivized to do so to win some kind of economic competition.
I see it as a containment method for AI Safety testing (cf. last paragraph on the treacherous turn). If there is some kind of strong incentive to have access to a “powerful” safe-AGI very quickly, and labs decide to skip the Safety-testing part, then that is another problem.
Added “AI” to prevent death from laughter.
I agree that the “Camp” in the title was confusing, so I changed it to “Summer School”. Thank you!
a treacherous turn involves the agent modeling the environment sufficiently well that it can predict the payoff of misbehaving before taking any overt actions.
I agree. To be able to make this prediction, it must already know about the preferences of the overseer, know that the overseer would punish unaligned behavior, potentially estimating the punishing reward or predicting the actions the overseer would take. To make this prediction it must therefore have some kind of knowledge about how overseers behave, what actions they are likely to punish. If this knowledge does not come from experience, it must come from somewhere else, maybe from reading books/articles/Wikipedia or oberving this behaviour somewhere else, but this is outside of what I can implement right now.
The Goertzel prediction is what is happening here.
It’s important to start getting a grasp on how treacherous turns may work, and this demonstration helps; my disagreement is on how to label it.
I agree that this does not correctly illustrate a treacherous right now, but it is moving towards it.
Thanks for the suggestion!
Yes, it learned through Q-learning to behave differently when he had this more powerful weapon, thus undertaking multiple treacherous turn in training. A “continual learning setup” would be to have it face multiple adversaries/supervisors, so it could learn how to behave in such conditions. Eventually, it would generalize and understand that “when I face this kind of agent that punishes me, it’s better to wait capability gains before taking over”. I don’t know any ML algorithm that would allow such “generalization” though.
About an organic growth: I think that, using only vanilla RL, it would still learn to behave correctly until a certain threshold in capability, and then undertake a treacherous turn. So even with N different capability levels, there would still be 2 possibilities: 1) killing the overseer gives the highest expected reward 2) the aligned behavior gives the highest expected reward.
Congrats on your meditation! I remember commenting on your Prologue, about 80 days ago. Time flies!
Good luck with your ML journey. I did the 2011 Ng ML course, that uses Matlab, and Ng’s DL specialization. If you want to get a good grasp of recent ML I would recommend you to directly go to the DL specialization. Most of the original course is in the newer course, and the DL specialization uses more recent libraries (tf, keras, numpy).
Let me see if I got it right:
1) If we design an aligned AGI by supposing it doesn’t have a mind, it will produce an aligned AGI even if it actually possess a mind.
2) In the case we suppose AGI have minds, the methods employed would fail if it doesn’t have a mind, because the philosophical methods employed only work if the subject has a mind.
3) The consequence of 1) and 2) is that supposing AGI have minds has a greater risk of false positive.
4) Because of Goodhart’s law, behavioral methods are unlikely to produce aligned AGI
5) Past research on GOFAI and the success of applying “raw power” show that using only algorithmic methods for aligning AGI is not likely to work
6) The consequence of 4) and 5) is that the approach supposing AGI do not have minds is likely to fail at producing aligned AI, because it can only use behavioral or algorithmic methods.
7) Because of 6), we have no choice but take the risk of false positive associated with supposing AGI having minds
a) The transition between 6) and 7) assumes implicitly that:
(*) P( aligned AGI | philosophical methods ) > P( aligned AI | behavorial or algorithmic methods)
b) You say that if we suppose the AGI does not have a mind, and treat is a p-zombie, then the design would work even though it has mind. Therefore, when supposing that the AGI does not have a mind, there is no design choices that optimize the probability of aligned AGI by assuming it does not possess mind.
c) You assert that using philosophical methods (assuming the AGI does have a mind), a false positive would make the method fail, because the methods use extensively the hypothesis of a mind. I don’t see why a p-zombie (which by definition would be indistinguishable from an AGI with a mind) would be more likely to fail than an AGI with a mind.
As you mentionned, no axiology can be inferred from ontology alone.
Even with meta-ethical uncertainty, if we want to build an agent that takes decisions/actions, it needs some initial axiology. If you include (P) “never consider anything as a moral fact” as part of your axology, then two things might happen:
1) This assertion (P) stays in the agent without being modified
2) The agent rewrites its own axiology and modify/delete (P)
I see a problem here. If 1) holds, then it has considered (P) has a moral fact, absurd. If 2) holds, then your agent has lost the meta-ethics principle you wanted him to keep.
So maybe you wanted to put the meta-ethics uncertainty inside the ontology ? It this is what you meant, that doesn’t seem to solve the axiology problem.
Thank you for your article. I really enjoyed our discussion as well.
To me, this is absurd. There must be something other than readability that defines what a simulation is . Otherwise, I could point to any sufficiently complex object and say : “this is a simulation of you”. If given sufficient time, I could come up with a reading grid of inputs and outputs that would predict your behaviour accurately.
I agree with the first part (I would say that this pile of sand is a simulation of you). I don’t think you could accurately predict any behaviour accurately though.
If I want to predict what Tiago will do next, I don’t need just a simulation of Tiago, I need at least some part of the environment. So I would need to find some more sand flying around, and then do more isomorphic tricks to be able to say “here is Tiago, and here is his environment, so here is what he will do next”. The more you want to predict, the more you need information from the environment. But the problem is that, the more information you get at the beginning, and the more you get at the end, and the more difficult it gets to find some isomorphism between the two. And it might just be impossible because most spaces are not isomorph.
There is something to be said about complexity, and the information that drives the simulation. If you are able to give a precise mapping between sand (or a network of men) and some human-simulation, then this does not mean that the simulation is happening within the sand: it is happening inside the mind doing the computations. In fact, if you understand well-enough the causal relationships in the “physical” world, the law of physics etc., to precisely build some mapping from this “physical reality” to a pile of sand flying around, then you are kind of simulating it in your brain while doing the computations.
Why I am saying “while doing the computations”? Because I believe that there is always someone doing the computations. Your thought experiments are really interesting, and thank you for that. But in the real world, sand does not start flying around in some strange setting forever without any energy. So, when you are trying to predict things from the mapping of the sand, the energy comes from the energy of your brain doing those computations / thought experiments. For the network of men, the energy comes from the powerful king giving precise details about what computations the men should do. In your example, we feel that it must not be possible to obtain consciousness from that. But this is because the energy to effectively simulate a human brain from computations is huge. The number of “basic arithmetic calculations by hand” needed to do so is far greater than what a handful of men in a kingdom could do in their lifetime, just to simulate like 100 states of consciousness of the human being simulated.
The simulation may be a way of gathering information about what is rendered, but it can’t influence it. This is because the simulation does not create the universe that is being simulated.
Well, I don’t think I fully understand your point here. The way I see it, Universe B is inside Universe A. It’s kind of a data compression, so a low-res Universe (like a video game in your TV). So whatever you do inside Universe A that influences the particles of the “Universe B” (which is part of the “physical” Universe A) will “influence” Universe B.
So, what you’re saying is that the Universe B kind of exists outside the physical world, like in the theoretical world, and so when we’re modifying Universe B (inside universe A) we are making the “analogy” wrong, and simulating another (theoretical) Universe, like Universe C?
If this is what you meant, then I don’t see how it connects to your other arguments. Whenever we give more inputs to a simulated universe, I believe we’re adding some new information. If you’re simulation is a closed one, and we cannot interact with it or add any input, then ok, it’s a closed simulation, and you cannot change it from the outside. But you have indeed a simulation of a human being and are asking what happens if you torture him, you might want to incorporate some “external inputs” from torture.
You’re right. I appreciate the time and effort you put in giving feedback, especially the google docs. I think I didn’t said it enough, and didn’t get to answer your last feedbacks (will do this weekend).
The question is: are people putting to much effort in giving feedback with small improvements in the writing/posts? If yes, then it feels utterly inefficient to continue giving feedback or writing those daily posts.
I also believe that one can control the time he spends on giving feedback, by saying only the most important thing (for instance Ikaxas saying the bold/underline thing).
I am not sure if this is enough to make daily LessWrong posts consistently better, and more importantly if it is enough to make them valuable/useful for the readers.
I am actively looking for a way to continue posting daily (on Medium or a personal website) and keep getting good feedback without spamming the community. I could request quality feedback (by posting every week max) only once in a while and not ask for too much of your time (especially you, Elo).
Thank you again for your time/efforts, and the feedback you gave in the google docs/comments.
I gave some points about the higher quality/low quality debate in my two answers to Viliam, but I will answer more specifically to this here.
The quality of a post is relative to the other posts. Yes, if the other articles are from Scott Alexander, ialdaboth, sarahconstantin and Rob Bensiger, the quality of my daily posts are quite deplorable, and spamming the frontpage with low quality posts is not what LW users want.
However, for the last few days, I decided not to publish on the frontpage, and LW even changed the website so that I can’t publish on the frontpage. So it’s personal blog by default, and it will go to frontpage only if mods/LW users enjoy it and think it’s insightful enough.
Are you saying that people might want high quality personal blogs then?
Well, I get why people might be interested in reading personal blogs, and want them to be of high quality. And, because you got to correct some of my posts, I understand the frustration of seeing articles published where there still is a lot of work to do.
However, the LW algorithm is also responsible for this. Maybe it promotes too much the recent posts, and should highlight more the upvoted ones. Then, my posts will never be visible. Only the 20+ upvotes will be visible in the personal blogs page.
I understand why people would prefer an article that took one week to write, short and concise, particularly insightful. I might prefer that as well, and start to only post higher-quality posts here. But I don’t agree that it is not recommended for people to post not-well-thought-off articles on a website where you are able to post personal blogs.
I think volume is not a problem if the upvote/downvote system and the algorithms are good enough to filter the useful posts for the readers. People should not filter themselves, and keep articles they enjoy not as much as Scott Alexander ones ( but still find insightful), for themselves.
So, twelve articles, one of them interesting, three or four have a good idea but are very long, and the rest feels useless.
I appreciate you took the time to read all of them (or enough to comment on them). I also feel some are better written than the others, and I was also more inspired for some. From what I understood, you want the articles to be “useful” and “not too long”. I understand what you would want that (maximize the (learned stuff)/(time spent on learning) ration). I used to write on Medium where the read ratio of posts would decrease significantly with the length of the post. This pushed me to read shorter and shorter posts, if I wanted to be read entirely. I wanted to try LW because I imagined here people would have longer attention spans and could focus on philosophical/mathematical thinking. However, if you’re saying I’m being “too long with very low density of ideas” I understand why this could be infuriating.
I typically do not downvote the “meh” articles, but that’s under assumptions that they don’t appear daily from the same author
I get your point, and it makes sense with what you said in the first comment. However, I don’t feel comfortable with people downvoting “meh” articles because of the author (even though it’s daily). I would prefer a website where people could rate articles independently of who the author is, and then check their other stuff.
My aggregate feedback would be: You have some good points. But sometimes you just write a wall of text.
Ok. So I should be more clear/concise/straight-to-the-point, gotcha.
And I suspect that the precommitment to post an article each day could be making this a lot worse. In a different situation, such as writing for an online magazine which wants to display a lot of ads, writing a lot of text with only a few ideas would be a good move; here it is a bad move.
Could you be more specific about what you think would be my move? For the online magazine, getting the maximum number of clicks/views to display the more ads makes sense, and so lots of text with lots of ads, and enough ideas to ensure the reader keeps seeing adds makes sense.
But what about LW? My move here was simple: understand better AI Safety by forcing myself to daily crystallize ideas about ideas related to the field, on a website with great feedback/discussions and low-tolerance for mistakes. For now, the result (in the discussions) is, overall, satisfying, and I feel that people here seem to enjoy AI Safety stuff.
More generally, I think the fact that if I generate 10% of headers or you get to click on all my articles may be correlated to other factors than me daily posting, such as:
The LW algorithm promotes them
You’re “Michaël Trazzi” filter (you need one, because you get to see my header) is not tuned correctly, because you still seem to still be reading them, even if only 1⁄12 felt useful (or maybe you just read them to comment on this post?).
This comment is already long (sorry for the wall of text), so I will say more about the Meta LW high/low quality debate on Elo’s comment below.
Thank you Viliam for your honest feedback.
I think you’re making some good points, but you’re ignoring (in your comment) some aspects.
“do I want this kind of article, from the same author, to be here, every day?“. And the answer is “hell no”.
So what you’re saying is “whenever deciding to upvote or downvote, I decide whether I want more articles like this or not. But because you’re posting every day, when I am deciding whether or not to downvote, I am deciding if I want an article every single day and the answer to this is no”.
I understand the difference in choice here (a choice for every article, instead of just for one). I assumed that on LW people could think about posts independently, and could downvote a post and upvote another from the same author, saying what felt useful or not, even if it is daily. I understand that you just want to say “no” to the article, to say “no” to the series, and this is even more true if the ratio of good stuff is the one you mention at the end.
It is easier to just ignore a one-off mistake than to ignore a precommitment to keep doing them every day.
What would be the mistake here? From what I understand, when reading an article and seeing a mistake, the mistake is “multiplied” by the number of time it could happen again in other articles, so every tiny mistakes becomes important? If I got you right, I think that by writing daily, those little mistakes (if possible to correct easily) could be corrected quickly by commenting on a post, and I would take it into account in the next posts. A short feedback loop could improve quickly the quality of the posts. However, I understand that people might not want LW to be an error-tolerant zone, but would prefer a performance zone.
And… you are polluting this filter. Not just once in a while, but each day. You generate more than 10% of headers on this website recently.
I had not thought about it in terms of daily % of headers of the website, interesting point of view. I also use Hacker News as a filter (for other interests) and LW is also a better option for the interests I mentioned in my posts. I think the real difference is the volume of posts in hacker news/reddit/LW. It is always a tradeoff between being in a pool of hundreds of high quality posts (more people reading, but more choices for them), or a pool of only a dozens of even-higher quality posts but less traffic.