Newcomb’s problem doesn’t specify how Omega chooses the ‘customers’. It’s a quite realistic possibility that it simply has not offered the choice to anyone that would use a randomizer, and cherrypicked only the people which have at least 99.9% ‘prediction strength’.
PeterisP
It definitely should have quality control.
The whole point of the ‘Scary idea’ is that there should be an effective quality control for GAI, otherwise the risks are too big.
At the moment humanity has no idea on how to make an effective quality control—which would be some way to check if an arbitrary AI-in-a-box is Friendly.
Ergo, if a GAI is launched before Friendly AI problem has some solutions, it means that GAI was launched without a quality control performed. Scary. At least to me.
Assuming that a general, powerful intelligence has a goal ‘do x’, say—win chess games, optimize traffic flow or find cure for cancer, then it has implicit dangerous incentives if we don’t figure out a reasonable Friendly framework to prevent them.
A self-improving intelligence that does changes to it’s code to become better at doing it’s task may easily find out that, for example, a simple subroutine that launches a botnet in the internet (as many human teenagers have done), might get it an x % improvement in processing power that helps it to obtain more wins chess games, better traffic optimizations or faster protein-folding for the cure of cancer.
A self-improving general intelligence that has human-or-better capabilities may easily deduce that a functioning off-button would increase the chances of it being turned off, and that it being turned off would increase the expected time of finding cure for cancer. This puts this off-button in the same class as any other bug that hinders its performance. Unless it understands and desires the off-button to be usable in a friendly way, it would remove it; or if it’s hard-coded as nonremovable, then invent workarounds for this perceived bug—for example, develop a near-copy of itself that the button doesn’t apply to, or spend some time (less than the expected delay due to the turning-off-risk existing, thus rational spending of time) to study human psychology/NLP/whatever to better be able to convince everyone that it shouldn’t be turned off ever, or surround the button with steel walls—these are all natural extensions of it following it’s original goal.
If an self-improving AI has a goal, then it cares. REALLY cares for it in a stronger way than you care for air, life, sex, money, love and everything else combined.
Humans don’t go FOOM because they a)can’t at the moment and b) don’t care about such targeted goals. But for AI, at the moment all we know is how to define such supergoals which work in this unfriendly manner. At the moment we don’t know how to make these ‘humanity friendly’ goals, and we don’t know how to make an AI that’s self-improving in general but ‘limited to certain contraints’. You seem to imply these constraints as trivial—well, they aren’t, the friendliness problem actually may as hard or harder than general AI itself.
OK, well these are the exact points which need some discussion.
1) Your comment “general intelligence is [..] is not equipped with some goal naturally”—I’d say that it’s most likely that any organization investing the expected huge manpower and resources in creating a GAI would create it with some specific goal defined for it.
However, in absence of an intentional goal given by the ‘creators’, it would have some kind of goals, otherwise it wouldn’t do absolutely anything at all, so it wouldn’t be showing any signs of it’s (potential?) intelligence.
2) In response to “If a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries”—I’d say that defining specific goals is simple, too simple. From any learning-machine design a stupid goal ‘maximize number of paperclips in universe’ would be very simple to implement, but a goal like ‘maximize welfare of humanity without doing anything “bad” in the process’ is an extremely complex goal, and the boundary setting is the really complicated part, which we aren’t able to even describe properly.
So in my opinion is quite viable to define a specific goal that is suitable to self-improve against, and that includes some scope boundaries—but where the defined scope boundaries has some unintentional loophole which causes disaster.
3) I can agree that working on AGI research is essential, instead of avoiding it. But taking the step from research through prototyping to actually launching/betatesting a planned powerful self-improving system is dangerous if the world hasn’t yet finished an acceptable solution to Friendliness or the boundary-setting problem. If having any bugs in the scope boundaries is ‘unlikely’ (95-95% confidence?) then it’s not safe enough, because 1-5% chance of an extinction event after launching the system is not acceptable, it’s quite a significant chance—not the astronomical chances involved in Pascal’s wager or asteroid hitting the earth tomorrow or LHC ending the universe.
And given the current software history and published research on goal systems, if anyone would show up today and demonstrate that they’ve solved self-improving GAI obstacles and can turn it on right now, then I can’t imagine how they could realistically claim a larger than 95-99% confidence in their goal system working properly. At the moment we can’t check any better, but such a confidence level simply is not enough.
″ How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow”—because there is no reason for an AGI automagically creating arbitrary restrictions if they aren’t part of the goal or superior to the goal. For example, I’m quite sure that F1 rules prohibit interfering with drivers during the game; but if somehow a silicon-reaction-speed AGI can’t win F1 by default, then it may find it simpler/quicker to harm the opponents in one of the infinity ways that the F1 rules don’t cover—say, getting some funds in financial arbitrage, buying out the other teams, and firing any good drivers or engineering a virus that halves the reaction speed of all homo-sapiens—and then it would be happy as the goal is achieved within the rules.
I’d suggest to take a look at the 1900 variant, it’s quite close to the original but provides much more interaction between the distant powers, giving strong reasons for Britain to talk to Turkey since the early game.
I perceive the intention of the original assertion is that even in this case you would still fail in making 10.000 independent statements of such sort—i.e., in trying to do it, you are quite likely somehow make a mistake at least once, say, by a typo, a slip of the tongue, accidentally ommitting ‘not’ or whatever. All it takes to fail on a statement like “53 to be prime” all it takes is for you to not notice that it actually says ’51 is prime’ or make some mistake when dividing.
Any random statement of yours has a ‘ceiling’ of x-nines accuracy.
Even any random statement of yours where it is known that you aren’t rushed, tired, on medication, sober, not sleepy, had a chance and intent to review it several times still has some accuracy ceiling, a couple orders of magnitude higher, but still definitely not 1.
Spies by definition are agents of foreign powers acting on your soil without proper registration—i.e., like the many representatives in embassies have registered as agents of that country and are allowed to operate on their behalf until/if expelled.
As far as Assange (IIRC) has not been in USA while the communiques were leaked, and it is not even claimed that he is an agent of some other power, then there was no act of espionage. It might be called espionage if and only if Manning was acting on behalf of some power—and even then, Manning would be the ‘spy’, not Assange.
I’m not an expert on relevant US legislative acts, but this is the legal definition in local laws here and I expect that the term of espionage have been defined a few centuries ago and would be mostly matching throughout the world.
A quick look at current US laws (http://www.law.cornell.edu/uscode/18/usc_sec_18_00000793----000-.html) does indicate that there is a penalty for such actions with ‘intent or reason to believe … for the injury of United States or advantage of any foreign nation’ - so simply acting to intentionally harm US would be punishable as well, but it’s not calling it espionage. And the Manning issue would depend on his intention/reason to believe about harming US vs. helping US nation, which may be clarified by evidence in his earlier communications with Adrian Lamo and others.
Then it should be rephrased as ‘We should seek a model of reality that is accurate even at the expense of flattery.’
Ambiguous phrasings facilitate only confusion.
Sorry for intruding on an very old post, but checking ‘peoplerandom’ integers modulo 2 is worse than flipping a coin—when asked for a random number, people tend to choose odd numbers more often than even numbers, and prime numbers more often than non-prime numbers.
hypothesis—that it is really hard to over-ride the immediate discomfort of an unpleasant decision—is to look at whether aversions of comparable or greater magnitude are hard to override. I think the answer in general is ‘no.’ Consider going swimming and having to overcome the pain of entering water colder than surrounding. This pain, less momentary than the one in question and (more or less) equally discounted, doesn’t produce problematic hesitation.
I can’t agree with you—it most definitely does produce a problematic hesitation. If you’re bringing this example, then I’d say that it is evidence that the general answer is ‘yes’, at least for a certain subpopulation of homo sapiens.
Which are the useful areas of AI study?
I’ve worn full-weight chain and plate reconstruction items while running around for a full day, and I’m not physically fit at all—I’d say that a random geeky 12 year old boy would be easily able to wear an armor suit, the main wizard-combat problems being getting winded very, very quickly if running (so they couldn’t rush in the same way as Draco’s troops did), and slightly slowed down arm movement, which might hinder combat spellcasting. It is not said how long the battles are—if they are less than an hour, then there shouldn’t be any serious hindrances; if longer then the boys would probably want to sit down and rest occasionally or use some magic to lighten the load.
To put it in very simple terms—if you’re interested in training AI according to technique X because you think that X is the best way, then you design or adapt the AI structure so that technique X is applicable. Saying ‘some AI’s may not respond to X’ is moot, unless you’re talking about trying to influence (hack?) AI designed and controlled by someone else.
I’m still up in the air regarding Eliezer’s arguments about CEV.
I have all kinds of ugh-factors coming in mind about not-good or at least not-‘PeterisP-good’ issues an aggregate of 6 billion hairless ape opinions would contain.
The ‘Extrapolated’ part is supposed to solve that; but in that sense I’d say that it turns the whole concept of this problem from knowledge extraction to the extrapolation. In my opinion, the difference between the volition of Random Joe and volition of Random Mohammad (forgive me for stereotyping for the sake of a short example) is much smaller than the difference between volition of Random Joe and the extrapolated volition of Random Joe ‘if he knew more, thought faster, was more the person he wishes he was’. Ergo, the idealistic CEV version of ‘asking everyone’ seems a bit futile. I could go into more detail, but in that case that’s probably material for a separate discussion, analyzing the parts of CEV point by point.
In that sense, it’s still futile. The whole reason for the discussion is that AI doesn’t really need permission or consent of anyone; the expected result is that AI—either friendly or unfriendly—will have the ability to enforce the goals of its design. Political reasons will be easily satisfied by a project that claims to try CEV/democracy but skips it in practice, as afterwards the political reasons will cease to have power.
Also, a ‘constitution’ matters only if it is within the goal system of a Friendly AI, otherwise it’s not worth the paper it’s written on.
I haven’t read the books you mention, but it seems that Sterman’s ‘Business Dynamics: Systems thinking and modeling for a complex world’ covers mostly the same topics, and it felt really well written, I’d recommend that one as an option as well.
The saying actually goes ‘jack of all trades and a master of none, though oft better than a master of one’.
There are quite a few insights and improvements that are obvious with cross-domain expertise, and much of the new developments nowadays pretty much are merging of two or more knowledge domains—bioinformatics as a single, but not nearly only example. Computational linguistics, for example—there are quite a few treatises on semantics written by linguists that would be insightful and new for computer science guys handling also non-linguistic knowledge/semantics projects.
Well, I fail to see any need for backward-in-time causation to get the prediction right 100 out of 100 times.
As far as I understand, similar experiments have been performed in practice and homo sapiens are quite split in two groups ‘one-boxers’ and ‘two-boxers’ who generally have strong preferences towards one or other due to whatever differences in their education, logic experience, genetics, reasoning style or whatever factors that are somewhat stable specific to that individual.
Having perfect predictive power (or even the possibility of it existing) is implied and suggested, but it’s not really given, it’s not really necessary, and IMHO it’s not possible and not useful to use this ‘perfect predictive power’ in any reasoning here.
From the given data in the situation (100 out of 100 that you saw), you know that Omega is a super-intelligent sorter who somehow manages to achieve 99.5% or better accuracy in sorting people into one-boxers and two-boxers.
This accuracy seems also higher than the accuracy of most (all?) people in self-evaluation, i.e., as in many other decision scenarios, there is a significant difference in what people believe they would decide in situation X, and what they actually decide if it happens. [citation might be needed, but I don’t have one at the moment, I do recall reading papers about such experiments]. The ‘everybody is a perfect logician/rationalist and behaves as such’ assumption often doesn’t hold up in real life even for self-described perfect rationalists who make strong conscious effort to do so.
In effect, data suggests that probably Omega knows your traits and decision chances (taking into account you taking into account all this) better than you do—it’s simply smarter than homo sapiens. Assuming that this is really so, it’s better for you to choose option B. Assuming that this is not so, and you believe that you can out-analyze Omega’s perception of yourself, then you should choose the opposite of whatever Omega would think of you (gaining 1.000.000 instead of 1.000 or 1.001.000 instead of 1.000.000). If you don’t know what Omega knows about you—then you don’t get this bonus.