Hi.
I registered and started posting a while back, but since then have reverted to lurking. Partly due to not having time, but I can also identify with reasons some others have given.
You can have emotions while being rational, and you can be rational while having emotions. They are opposed sometimes, but they do not always have to be. But when there is a conflict between them, rationality (so long as you practice it properly) is more reliable in reaching correct, useful conclusions.
I was about to point out that the fascinating and horrible dynamics of over-the-top threats are covered in length in Strategy of Conflict. But then I realised you’re the one who made that post in the first place. Thanks, I enjoyed that book.
Verifying a proof is quite a bit simpler that coming up with the proof in the first place.
It’s much easier to limit output than input, since the source code of the AI itself provide it with some patchy “input” about what the external world is like. So there is always some input, even if you do not allow human input at run-time.
ETA: I think I misinterpreted your comment. I agree that input should not be unrestricted.
Saying “oh sorry I hurt your feelings” is just plain being nice, which is a good idea whether you are aiming to be rational or not.
It would be great for this rationalist community to be able to discuss any topic, but in a way that insulates the main rationality discussions from off-topic discussions. Perhaps forum software separate from the main format of LessWrong? Are monthly open threads enough for off-topic discussions?
EDIT: The original post now has updated times and links, so refer to that instead.
Here are links to the times suggested, for convenience:
Melbourne: Tuesdays at 9pm (edited to actually coincide with Paris)
I’d suggest posting meeting times using timeanddate.com, to help avoid confusion about time zones and daylight savings.
You sir, have made a gender assumption.
Add a term granting a large disutility for deaths, and this should do the trick.
What if death isn’t well-defined? What if the AI has the option of cryonically freezing a person to save their life—but then being frozen, that person does not have any “current” utility function, so the AI can then disregard them completely. Situations like this also demonstrate that more generally, trying to satisfy someone’s utility function may have an unavoidable side-effect of changing their utility function. These side-effects may be complex enough that the person does not forsee them, and it is not possible for the AI to explain them to the person.
I think your “simple hack” is not actually that simple or well-defined.
If Dave holds a consequentialist ethical theory that only values his own life, then yes we are screwed.
If Dave’s consequentialism is about maximizing something external to himself (like the probable state of the universe in the future, regardless of whether he is in it), then his decision has little or no weight if he is a simulation, but massive weight if he is the real Dave. So the expected value of his decision is dominated by the possibility of him being real.
Here’s an idea for how a LW-based commercial polling website could operate. Basically it is a variation on PredictionBook with a business model similar to TopCoder.
The website has business clients, and a large number of “forecasters” who have accounts on the website. Clients pay to have their questions added to the website, and forecasters give their probability estimates for whichever questions they like. Once the answer to a question has been verified, each forecaster is financially rewarded using some proper scoring rule. The more money assigned to a question, the higher the incentive for a forecaster to have good discrimination and calibration. Some clever software would also be needed to combine and summarize data in a way that is useful to clients.
The main advantage of this over other prediction markets is that the scoring rule encourages forecasters to give accurate probability estimates.
I’m concerned about the moral implications of creating intelligent beings with the intent of destroying them after they have served our needs [...]
Personally, I would rather be purposefully brought into existence for some limited time than to never exist at all, especially if my short life was enjoyable.
I evaluate the morality of possible AI experiments in a consequentialist way. If choosing to perform AI experiments significantly increases the likelihood of reaching our goals in this world, it is worth considering. The experiences of one sentient AI would be outweighed by the expected future gains in this world. (But nevertheless, we’d rather create an AI that experiences some sort of enjoyment, or at least does not experience pain.) A more important consideration is social side-effects of the decision—does choosing to experiment in this way set a bad precedent that could make us more likely to de-value artificial life in other situations in the future? And will this affect our long-term goals in other ways?
I have had some similar thoughts.
The AI box experiment argues that a “test AI” will be able to escape even if it has no I/O (input/output) other than a channel of communication with a human. So we conclude that this is not a secure enough restraint. Eliezer seems to argue that it is best not to create an AI testbed at all—instead get it right the first time.
But I can think of other variations on an AI box that are more strict than human-communication, but less strict than no-test-AI-at-all. The strictest such example would be an AI simulation in which the input consisted of only the simulator and initial conditions, and the output consisted only of a single bit of data (you destroy the rest of the simulation after it has finished its run). The single bit could be enough to answer some interesting questions (“Did the AI expand to use more than 50% of the available resources?”, “Did the AI maximize utility function F?”, “Did the AI break simulated deontological rule R?”).
Obviously these are still more dangerous that no-test-AI-at-all, but the information gained from such constructions might outweigh the risks. Perhaps if I/O is restricted to few enough bits, we could guarantee safety in some information-theoretic way.
What do people think of this? Any similar ideas along the same lines?
Post also mentioned Tolerate Tolerance
If we accept the simulation hypothesis, then there are already gzillions of copies of us, being simulated under a wide variety of torture conditions (and other conditions, but torture seems to be the theme here). An extortionist in our world can only create a relatively small number of simulations of us, relatively small enough that it is not worth taking them into account. The distribution of simulation types in this world bears no relation to the distribution of simulations we could possibly be in.
If we want to gain information about what sort of simulation we are in, evidence needs to come directly from properties of our universe (stars twinkling in a weird way, messages embedded in π), rather than from properties of simulations nested in our universe.
So I’m safe from the AI … for now.
If we are the kind of people who would delete lots of AIs, I don’t see why AIs would not see it as similarly ethical to delete lots of us.
So just in case we are a simulated AI’s simulation of its creators, we should not simulate an AI in a way it might not like? That’s 3 levels of a very specific simulation hypothesis. Is there some property of our universe that suggests to you that this particular scenario is likely? For the purpose of seriously considering the simulation hypothesis and how to respond to it, we should make as few assumptions as possible.
More to the point, I think you are suggesting that the AI will have human-like morality, like taking moral cues from others, or responding to actions in a tit-for-tat manner. This is unlikely, unless we specifically program it to do so, or it thinks that is the best way to leverage our cooperation.
Or the two are fairly independent—you can be good or bad at seeking status, intelligent or not-so intelligent, and it is possible to have any combination of those, including that of being unintelligent and yet still good at obtaining status.
For some things (especially concrete things like animals or toothpaste products), it is easy to find a useful reference class, while for other things it is difficult to find which possible reference class, if any, is useful. Some things just do not fit nicely enough into an existing reference class to make the method useful—they are unclassreferencable, and it is unlikely to be worth the effort attempting to use the method, when you could just look at more specific details instead. (“Unclassreferencable” suggests a dichotomy, but it’s more of a spectrum.) ETA: I see this point has already been made here.
Humans naturally use an ad-hoc method that is like reference class forecasting (that may not be perfect or completely rational, but does a reasonable job sometimes). It is useful when we first encounter something and do not yet have enough specific details to evaluate it on its own terms. Once we have those details, the forecasting method is not needed. We use forecasting to get a heuristic on which things are worth us investigating further, so we can make that more detailed evaluation. Often something that is unclassreferencable is more worth investigating—we are curious about things that do not fit nicely into our existing categories.
There are a couple of ways promoters of a product/idea can exploit humans’ natural forecasting habits. Sometimes the phrase “defies categorisation” or “doesn’t fit into the normal genres” is applied to a new piece of music, to suggest that it is unclassreferencable and therefore worth checking out (which is better than a potential listener lumping it into a category that they don’t like). On the other hand, sometimes promoters purposefully put themselves into a reference class, hoping that noone investigates finer details—like a new product claiming to be “environmentally friendly”, or people wearing certain clothes to appear to have higher status.
Let me know if I’m suffering from man-with-hammer syndrome here, but it seems reference class forecasting is a useful way to think about many promotional strategies in a more systematic way.
A man with one watch might have the wrong time; a man with two watches is more aware of his own ignorance.