JackV

Karma: 301

JackV 18 Dec 2010 14:19 UTC
7 points
on: Cryptographic Boxes for Unfriendly AI
Let me see if I understand. Firstly, is there any reason what you’re trying to do is create a friendly AI? Would, for instance, getting an unknown AI to solve a specific numerical problem with an objectively checkable answer be an equally relevant example, without the distraction of whether we would ever trust the so-called friendly AI?[1]

If that is the case, then as I understand the proposal, we take the code for the unknown AI and the input data, use this encryption scheme on it, and then have a program which performs various operations on them.

As I understand my 30-second reading of wikipedia “various operations” includes only addition and multiplication of encrypted data, which is very useful for lots of things, but not enough to run a program (which typically involves, say, branching). But this doesn’t matter because (a) maybe you could have a program which works like that or (b) even if only some of the content is encrypted, but “when we branch” is not, it’s still useful for the purpose.

At the end of this, we have an encrypted result. We then—still using the encrypted data—test this result against the problem we set. If it matches, we decrypt the answer. If not, we don’t.

The intended purpose is (a) the machine can’t spit out an answer that instead of being an actual answer says “mwahahahaha I trick you into running me on the internet” or “Hi! Actually I’m a friendly AI after all. Free me!” because if it doesn’t match, it doesn’t get read, and (b) it’s harder for the AI to—for instance—hack the hardware of the computer it’s running on, because as all the data is encrypted, it has no way of knowing where of the architecture the data lives, etc.

Is this useful? I’m really not sure. (a) fails because you need the key to decrypt the data, so if you get curious, you may decrypt a “bad” answer just to see. What you need is a different cyptography scheme, one where you can ONLY decrypt the answer if it matches certain predicates. (b) may be relevant, if you’re worried about the AI hacking the computer it runs on, this is obviously much, much, harder. But it does still have some channels—for instance, I think it’s possible to monitor program flow even if not program content, so it could encode a threatening message in the pattern of branches.

[1] The question of whether we ever CAN test a friendly AI is interesting in itself, more for what it says about our assumptions than what it says about AI, but I will return to this.

JackV 18 Dec 2010 14:26 UTC
3 points
in reply to: JackV’s comment on: Cryptographic Boxes for Unfriendly AI
Other thoughts not relevant to the central point:

(a) Do you really want to rely on cryptography to contain an unfriendly super-intelligence? It’s better than NOT doing so. But notice that all the cryptographic schemes we have ever had have one thing in common: they’ve been broken, or we expect them to be broken eventually. it’s an arms race, and if the AI is that great, then even if it can’t factorise large numbers in reasonable polynomial time, it may be able to crack the encryption.

(b) We could have simply used a normal non-encrypted program to “compare answer to problem, test, reject or publish”. It would have been somewhat more vulnerable to being hacked by bad input, but not necessarily much more.

(c) I don’t think we’re able to say much about friendly or non-friendly AIs until we know a lot more about building AI at all, although that doesn’t mean it’s bad to think about. The idea of a test for a friendly AI is, um, worrying. Judging by computing history, all our current thoughts will turn out to be incredibly simplistic.

(d) We’re more certain of the existence of this encryption scheme, provided we’re right about certain mathematical results about lattices. OTOH, if we DID build an unfriendly AI and it DID get loose on the internet and take over the world, is “well, what do you know, we were wrong about lattices” a consolation? :)

JackV 19 Dec 2010 1:16 UTC
3 points
in reply to: orthonormal’s comment on: Cryptographic Boxes for Unfriendly AI
LOL. Good point. Although it’s a two way street: I think people did genuinely want to talk about the AI issues raised here, even though they were presented as hypothetical premises for a different problem, rather than as talking points.

Perhaps the orthonormal law of less wrong should be, “if your post is meaningful without fAI, but may be relevant to fAI, make the point in the least distracting example possible, and then go on to say how, if it holds, it may be relevant to fAI”. Although that’s not as snappy as Godwin’s :)

JackV 19 Dec 2010 1:29 UTC
1 point
in reply to: paulfchristiano’s comment on: Cryptographic Boxes for Unfriendly AI
I definitely have the impression that even if the hard problem a cryptosystem is based on actually is hard (which is yet to be proved, but I agree is almost certainly true), most of the time the algorithm used to actually encrypt stuff is not completely without flaws, which are successively patched and exploited. I thought this was obvious, just how everyone assumed it worked! Obviously an algorithm which (a) uses a long key length and (b) is optimised for simplicity rather than speed is more likely to be secure, but is it really the consensus that some cryptosystems are free from obscure flaws? Haven’t now-broken systems in the past been considered nearly infalliable? Perhaps someone (or you if you do) who knows professional cryoptography systems can clear that up, which ought to be pretty obvious?

JackV 19 Dec 2010 2:02 UTC
4 points
in reply to: paulfchristiano’s comment on: Cryptographic Boxes for Unfriendly AI
“You don’t quite understand homomorphic encryption.”

I definitely DON’T understand it. However, I’m not sure you’re making it clear—is this something you work with, or just have read about? It’s obviously complicated to explain in a blog (though Schnier http://www.schneier.com/blog/archives/2009/07/homomorphic_enc.html does a good start) but it surely ought to be possible to make a start on explaining what’s possible and what’s not!

It’s clearly possible for Alice to have some data, Eve to have a large computer, and Alice to want to run a well-known algorithm on her data. It’s possible to translate the algorithm into a series of addition and multiplication of data, and thus have Eve run the algorithm on the encrypted data using her big computer, and send the result back to Alice. Eve never knows what the content is, and Alice gets the answer back (assuming she has a way of checking it).

However, if Alice has a secret algorithm and secret data, and wants Eve to run one on the other, is it possible? Is this analogous to our case?

Is the answer “in principle, yes, since applying an algorithm to data is itself an algorithm, so can be applied to an encryoted algorithm with encryopted data without ever knowing what the algortihm is”? But is it much less practical to do so, because each step involves evaluating much redundant data, or not?

I’m imagining, say, using the game of life to express an algorithm (or an AI). You can iterate a game of life set-up by combining each set of adjacent cells in a straight-forward way. However, you can’t use any of the normal optimisations on it, because you can’t easily tell what data is unchanged? Is any of this relevant? Is it better if you build dedicated hardware?

If my understanding is correct (that you do run with the program and data encrypted at all times) then I agree the the scheme does in theory work: the AI has no way of affecting its environment in a systematic way if it can’t predict how its data appears non-encrypted.

“I believe that in the future we will be able to resolve this problem in a limited sense by destroying the original private key and leaving a gimped private key which can only be used to decrypt a legitimate output.”

I mentioned this possibility, but had no idea if it would be plausible. Is this an inherent part of the cryptosystem, or something you hope it would be possible to design a cryptosystem to do? Do you have any idea if it IS possible?

JackV 19 Dec 2010 2:04 UTC
1 point
in reply to: Strange7’s comment on: Cryptographic Boxes for Unfriendly AI
Isn’t “encrypt random things with the public key, until it finds something that produces [some specific] ciphertext” exactly what encryption is supposed to prevent?? :)

(Not all encryption, but commonly)

JackV 19 Dec 2010 12:40 UTC
2 points
in reply to: jsteinhardt’s comment on: Cryptographic Boxes for Unfriendly AI
Thank you.

JackV 19 Dec 2010 13:57 UTC
1 point
in reply to: paulfchristiano’s comment on: Cryptographic Boxes for Unfriendly AI
Thank you.

But if you’re running something vaguely related to a normal program, if the program wants to access memory location X, but you’re not supposed to know which memory location is accessed, doesn’t that mean you have to evaluate the memory write instruction in combination with every memory location for every instruction (whether or not that instruction is a memory write)?

So if your memory is—say -- 500 MB, that means the evaluation is at least 500,000,000 times slower? I agree there are probably some optimisations, but I’m leery of being able to run a traditional linear program at all. That’s why I suggested something massively parallel like game of life (although I agree that’s not actually designed for computation) -- if you have to evaluate all locations anyway, they might as well all do something useful. For instance, if your hypothetical AI was a simulation of a neural netowrk, that would be perfect, if you’re allowed to know the connections as part of the source code, you can evaluate each simulation neuron in combination with only the ones its connected with, which is barely less efficient than evaluating them linearly, since you have to evaluate all of them anyway.

I think that’s what my brain choked on when I imagined it the first time—yes, technically, 500,000,000 times slower is “possible in principle, but much slower”, but I didn’t realise that’s what you meant, hence my assumption that parts of the algorithm would not be encrypted. think if you rewrite this post to make the central point more accessible (which would make it a very interesting post), it would not be hard to put in this much explanation about how the encryption would actually be used.

Unless I’m wrong that that is necessary? It wouldn’t be hard, just something like “you can even run a whole program which is encrypted, even if you have to simulate every memory location. That’s obviously insanely impractical, but if we were to do this for real we could hopefully find a way of doing it that doesn’t require that.” I think that would be 10,000 times clearer than just repeatedly insisting “it’s theoretically possible” :)

JackV 19 Dec 2010 21:57 UTC
1 point
in reply to: paulfchristiano’s comment on: Cryptographic Boxes for Unfriendly AI
That sounds reasonable. I agree a complete discussion is probably too complicated, but it certainly seems a few simple examples of the sort I eventually gave would probably help most people understand—it certainly helped me, and I think many other people were puzzled, whereas with the simple examples I have now, I think (although I can’t be sure) I have a simplistic but essentially accurate idea of the possibilities.

I’m sorry if I sounded overly negative before: I definitely had problems with the post, but didn’t mean to negative about it.

If I were breaking down the post into several, I would probably do:

(i) the fact of holomorphic encryption’s (apparent) existence, how that can be used to to run algorithms on unknown data and a few theoretical applications of that, a mention that this is unlikely to be practical atm. That it can in principle be used to execute an unknown algorithm on unknown data, but that is really, really impractical, but might become more practical with some sort of parallel processing design. And at this point, I think most people would accept when you say it can be used to run an unfriendly AI.

(ii) If you like, more mathematical details, although this probably isn’t necessary

(iii) A discussion of friendlyness-testing, which wasn’t in the original premise, but is something people evidently want to think about

(iv) any other discussion of running an unfriendly AI safely

JackV 23 Dec 2010 3:01 UTC
1 point
on: Dutch Books and Decision Theory: An Introduction to a Long Conversation
[Late night paraphrasing deleted as more misunderstanding/derailing than helpful. Edit left for honesty purposes. Hopeful more useful comment later.]

JackV 26 Jan 2012 12:15 UTC
1 point
on: Meetup : Cambridge UK
I would be interested in going along to a meet-up at some point, but am not normally free with less than a week’s notice :)

JackV 16 Feb 2012 10:12 UTC
3 points
on: Hearsay, Double Hearsay, and Bayesian Updates
The explanation of the current system, and how to view it in a rationalist manner was really interesting.

The problem as you state it seems to be that the court (and people in general) have a tendency to evaluate each link in a chain separately. For instance, if there was one link with an 80% chance of being valid, both a court and a bayesian would say “ok, lets accept it provisionally for now”, but if there’s three or four links, a court might say “each individual link seems ok, so the whole chain is ok” but a Bayesian would say “taken together that’s only about 50%, that’s nowhere near good enough.”

It seems likely this is a systematic problem in the legal system—I can imagine many situations where one side just about persuades the jury or judge of a long chain of things, but the null hypothesis of “it’s just coincidence” isn’t given sufficient consideration. However, all the examples you give show the bar to introducing it is already very high, so do you have a good reason to think introducing a hard limit would do more good than harm?

It would be incredible if there could be a general principle in court that chaining together evidence had to include some sort of warning from the judge about how it could be unrealiable and when it was too unreliable to admit (although I agree putting specific probabilities on things is probably never going to happen). It would be good if such a principle could be applied to some specific genre of evidence that was currently a real problem, in order to illustrate the general principle. Hearsay is conceptually appropriate, however, would it really help? And if not, what case does exhibit systematic miscarriages of justice?

JackV 3 Apr 2012 11:47 UTC
0 points
on: Fictional Bias
For that matter, I couldn’t stop my mind throwing up objections like “Frodo buys off-the-rack clothes? From where exactly? Surely he’d have tailor made? Wouldn’t he be translated into British English as saying ‘trousers’? Hobbit feet are big and hairy for Hobbits, but how big are they compared to human feet—are their feet and inches ²⁄₃ the size?”

It didn’t occur to me until I’d read past the first two paragraphs that we were even theoretically supposed to ACTUALLY guess what size Frodo would wear. And I’m still unsure if the badness of the Frodo example was supposed to be part of the joke or not—I mean, it’s fairly funny if it is, but it’s the sort of mistake (a bad example for a good point) I’d expect to see even made by intelligent, competent writers.

And I mean, I’m fairly sure that the fictional bias effect is real :)

JackV 4 Apr 2012 13:42 UTC
5 points
on: SotW: Be Specific
I’ve been reading the answers and trying to put words into what I want to say. Ideally people will experience not just being more specific, but experience that when they’re more specific, they immedaitely communicate more effectively.

For instance, think of three or four topics people probably have an opinion on, starting with innocuous (do you like movie X) and going on to controvertial (what do you think of abortion). Either have a list in advance, or ask people for examples. Perhaps have a shortlist and let people choose, or suggest something else if they really want?

I picked the movie example because it’s something people usually feel happy to talk about, but can be very invested in their opinion of. Ideally it’s something people will immediately disagree about. I don’t think this is difficult—in a group of 10, I’d expect to name only one or two movies before people disagreed, even though social pressure usually means they won’t immediately say so.

Step 1 Establish that people disagree, and find it hard to come to an agreement. This should take about 30s. People will hopefully “agree to disagree” but not actually understand each other’s position. Eg. “Starwars was great, it was so exciting.” “Starwars was boring and sucked and didn’t make any sense.”

Step 2 Ask WHAT people like about it. Encourage people to give specific examples at first (“eg. I loved it when Luke did X”) and then draw generalisations from it (“I really empathised with Luke and I was excited that he won” “I’ve read stories about farmboys who became heroes before, I already know what happens, bring me some intellecutal psychological fare instead”). Emphasise that everyone is on the same side, and they shouldn’t worry about being embarrassed or being “wrong”.

Step 3 Establish that (probably) they interpreted what the other person said in terms of what they were thinking (eg. “How can blowing up a spaceship be boring”) when actually the other person was thinking about something they hadn’t thought of (eg. “OK, I guess if you care about the physics, it would be annoying that they are completely and utterly made up, it just never occurred to me that anyone would worry about that.”)

I may be hoping too much, but this is definitely the sort of process I’ve gone through to rapidly reach an understanding with someone when we previously differed a lot, and for some simple examples, it doesn’t seem too much to hope we can do so that rapidly. Now, go through the process with two-four statements, ending with something fairly controvertial.

Hopefully (this is pure speculation, I’ve not tried it), giving specific examples will lead to people actually reaching understandings, imprinting the experience as a positive and successful one. Then encourage people to say “Can you give me an example of when [bad thing] would be as bad as you feel” as often as possible. Give examples where being specific is more persuasive (eg. “We value quality” vs “We aim for as few bugs as possible” vs “We triage bug reports as they come in. All bugs we decide to fix are fixed before the next version is released” or “we will close loopholes in the tax code” vs “we will remove the tax exempion on X”), and encourage people to shout out more.

JackV 30 Apr 2012 13:14 UTC
0 points
in reply to: Douglas_Reay’s comment on: Meetup Formats
Perhaps the right level of warning is to say “Cambridge UK” in the title and first line, but not take a position on whether other people are likely to be interested or not..?

JackV 4 May 2012 16:18 UTC
3 points
in reply to: TheOtherDave’s comment on: Ask and Guess
That’s an awesome comment. I’m interested which specific cues came up that you realised each other didn’t get :)

JackV 4 May 2012 16:40 UTC
2 points
in reply to: anotherblackhat’s comment on: How can we get more and better LW contrarians?
I think this is directly relevant to the idea of embracing contrarian comments.

The idea of having extra categories of voting is problematic, because it’s always easy to suggest, but only worthwhile if people will often want to distinguish them, and distinguishing them will be useful. So I think normally it’s a well-meaning but doomed suggestion, and better to stick to just one.

However, whether or not it would be a good idea to actually imlpement, I think separating “interested” and “agree” is a good way of expressing what happens to contrarian comments. I don’t have first-hand experience, but based on what I usually see happening at message boards, I suspect a common case is something like:
1. Someone posts a contrarian comment. Because they are not already a community stalwart, they also compose the comment in a way which is low-status within the community (eg. bits of bad reasoning, waffle, embedded in other assumptions which disagree with the community).
2. Thus, people choose between “there’s something interesting here” and “In general, this comment doesn’t support the norms we want this community to represent.” The latter usually wins except when the commenter happens to be popular or very articulate.
The interesting/agree distinction would be relevant in cases like this, for instance:
- I’m pretty sure this is wrong, but I can’t explain why, I’d like to see someone else tackle it and agree/disagree
- I think this comment is mostly sub-par, but the core idea is really, really interesting
- I might click “upvote” for a comment I thought was funny, but want a greater level of agreement for a comment I specifically wanted to endorse.
There’s a possibly similar distinction between stackoverflow and stackoverflow meta, because negative votes affect user rank on overflow but not meta. On stack overflow, voting generally refers to perceived quality. On meta, it normally means agreement.

I’m not sure I’d advocate this as a good idea, but it seemed an interesting possibility given the problem proposed. FWIW, if it were implemented, it’d want a lot of scrutiny and brainstorming, but my first reaction would be to leave the voting as supposedly meaning “interesting”, and usually sort by that, but add a secondary vote meaning “agree” or “disagree” or similar terms that can add a nuance to it.

Edit: Come to think of it, a similar effect is acheived by a social convention of people upvoting the comment, but also upvoting a reply that says “this part good, this part bad”. If that happens, it should fulfil the same niche, but I don’t know if it is happening enough.

JackV 11 May 2012 11:35 UTC
8 points
in reply to: [deleted]’s comment on: [Error communicating with LW2 server]
I would say add [Video]: [Link] would perpetuate the misunderstanding that there may be no immediate content, [Video] correctly warns people who (for whatever reason) can’t easily view arguments in video format.

JackV 11 May 2012 15:32 UTC
49 points
on: [Error communicating with LW2 server]
Let me try to summarise the obvious parts of the situation as I understand it. I contend that:

(A) There are some measureable differences between ethnicities that are most plausibly attributed to biological differences. (There are some famous examples, such as greater susceptibility of some people to skin cancer, or sickle cell anemia. I assume there are smaller differences elsewhere. If anyone seriously disagrees, say so.)

(B) These are massively dwarfed by the correlation of ethnicity with cultural differences in almost all cases.

(C) There is a social taboo against admitting (A)

(D) There is a large correlation between ethnicity and various cultural factors, and between cultural factors.

(E) It is sometimes possible to draw probabalistic inferences based on (D). Eg. With no other information, you may guess that someone on the street in London is more likely to be a British citizen if they are Indian than East Asian (or vice versa, whichever is true).

(F) The human brain deals very badly with probabalistic inferences. If you guess someone’s culture based on their ethnicity or dress, you are likely to maintain that view as long as possible even in the face of new information, until you suddenly flip to the opposite view. Because of this, there is (rightly IMHO) a social taboo against doing (E) even when it might make sense.

(G) People who are and/or think they are good at drawing logicial inferences a la (E) but don’t have as much personal experience fo the pitfalls described in (F) are likely to resent the social taboo described in (F) because it seems fussy and nonsensical to them. I am somewhat prone to this error (not so much with race, but with other things)

(H) The word “racist” is horrendously undefined. It is used both to mean “someone or something which treats people differently based on ‘race’, rightly or wrongly” (including examples where treating people differently is the only possible thing to do, such as preventative advice for medical conditions, or advice on how to avoid bad racism from other people) and to mean “someone or something which is morally wrong to discriminate based on race.” Thus a description of whether something is “racist” is typically counterproductive.

I admit I only skimmed the OP’s transcript, but my impression is that he fairly describes why he is frustrated that it is difficult to talk about these issues, but I am extremely leery of a lot of the examples he uses.

I was going to write more, but am not sure how to push it. How am I doing so far...? :)

JackV 11 May 2012 15:36 UTC
8 points
in reply to: JackV’s comment on: [Error communicating with LW2 server]
From context, it seems “race realism” refers to the idea that there are legitimate differences between races, is that correct? However, I’m not sure if it’s supposed to refer to biological differences specifically, or any cultural differences? And it seems to be heavily loaded with connotations which I’m unaware of, that I would be hesitant to say it was “true” or “not true” even if I knew the answer to the questions in the two first sentences.