+1, you convinced me.
I worry this will distract from risks like “making an AI that is smart enough to learn how to hack computers from scratch”, but I don’t buy the general “don’t distract with true things” argument.
+1, you convinced me.
I worry this will distract from risks like “making an AI that is smart enough to learn how to hack computers from scratch”, but I don’t buy the general “don’t distract with true things” argument.
“I don’t think that there is more that 1% that support direct violence against non-terrorists for its own sake”: This seems definitely wrong to me, if you also count Israelies who consider everyone in Gaza as potential terrorists or something like that.
If you offer Israelies:
Button 1: Kill all of Hamas
Button 2: Kill all of Gaza
Then definitely more than 1% will choose Button 2
I haven’t heard of anything like that (but not sure if I would).
Note there are also problems in trying to set up a government using force, in setting up a police force there if they’re not interested in it, in building an education system (which is currently, afaik, very anti Israel and wouldn’t accept Israel’s opinions on changes, I think) ((not that I’m excited about Israel’s internal education system either)).
I do think Israel provides water, electricity, internet, equipment, medical equipment (subsidized? free? i’m not sure of all this anyway) to Gaza. I don’t know if you count that is something like “building a stockpile of equipment for providing clean drinking water to residents of occupied territory”.
I don’t claim the current solution is good, I’m just pointing out some problems with what I think you’re suggesting (and I’m not judging whether those problems are bigger or smaller).
What do you mean by “building capacity” in this context? (maybe my English isn’t good enough, I didn’t understand your question)
I was a software developer in the Israeli military (not a data scientist), and I was part of a course constantly trains software developers for various units to use.
The big picture is that the military is a huge organization, and there is a ton of room for software to improve everything. I can’t talk about specific uses (just like I can’t describe our tanks or whatever, sorry if that’s what you’re asking, and sorry I’m not giving the full picture), but even things like logistics or servers or healthcare have big teams working on them.
Also remember the military started a long time ago, when there weren’t good off-the-shelf solutions for everything, and imagine how big are the companies that make many of the products that you (or orgs) use.
There are also many Israelies that don’t consider Plaestinians to be humans worth protecting, but rather as evil beings / outgroup / whatever you’d call that.
Also (with much less confidence), I do think many Palastinians want to kill Israelies because of things that I’d consider brainwashing.
Hard question—what to do about a huge population that’s been brainwashed like that (if my estimation here is correct), or how might a peaceful resolution look?
Not a question, but seems relevant for people who read this post:
Meni Rosenfeld, one of the early LessWrong Israel members, has enlisted:
Source: https://www.facebook.com/meni.rosenfeld/posts/pfbid0bkvfrb3qFTF7U82eMgkZzgMjMT4s3pbGUx7ahgKX1B8hr2n1viYqg9Msz6t3dBUPl (a public post by him)
Any ideas on how much to read this as “Sam’s actual opinions” vs “Sam trying to say things that will satisfy the maximum amount of people”?
(do we have priors on his writings? do we have information about him absolutely not meaning one or more of the things here?)
Hey Kaj :)
The part-hiding-complexity here seems to me like “how exactly do you take a-simulation/prediction-of-a-person and get from it the-preferences-of-the-person”.
For example, would you simulate a negotiation with the human and how the negotiation would result? Would you simulate asking the human and then do whatever the human answers? (there were a few suggestions in the post, I don’t know if you endorse a specific one or if you even think this question is important)
Because (I assume) once OpenAI[1] say “trust our models”, that’s the point when it would be useful to publish our breaks.
Breaks that weren’t published yet, so that OpenAI couldn’t patch them yet.
[unconfident; I can see counterarguments too]
Or maybe when the regulators or experts or the public opinion say “this model is trustworthy, don’t worry”
I’m confused: Wouldn’t we prefer to keep such findings private? (at least, keep them until OpenAI will say something like “this model is reliable/safe”?)
My guess: You’d reply that finding good talent is worth it?
This seems like great advice, thanks!
I’d be interested in an example for what “a believable story in which this project reduces AI x-risk” looks like, if Dane (or someone else) would like to share.
A link directly to the corrigibility part (skipping unrelated things that are in the same page) :
https://www.projectlawful.com/replies/1824457#reply-1824457
This post got me to do something like exposure therapy to myself in 10+ situations, which felt like the “obvious” thing to do in those situations. This is a huge amount of life-change-per-post
My thoughts:
[Epistemic status + impostor syndrome: Just learning, posting my ideas to hear how they are wrong and in hope to interact with others in the community. Don’t learn from my ideas]
A)
Victoria: “I don’t think that the internet has a lot of particularly effective plans to disempower humanity.
I think:
Having ready plans on the internet and using them is not part of the normal threat model from an AGI. If that was the problem, we could just filter out those plans from the training set.
(The internet does have such ideas. I will briefly mention biosecurity, but I prefer not spreading ideas on how to disempower humanity)
B)
[Victoria:] I think coming up with a plan that gets past the defenses of human society requires thinking differently from humans.
TL;DR: I think some ways to disempower humanity don’t require thinking differently than humans
I’ll split up AI’s attack vectors into 3 buckets:
Attacks that humans didn’t even think of (such as what we can do to apes)
Attacks that humans did think of but are not defending against (for example, we thought about pandemic risks but we didn’t defended against them so well). Note this does not require thinking about things that humans didn’t think about.
Attacks that humans are actively defending against, such as using robots with guns or trading in the stock market or playing go (go probably won’t help taking over the world, but humans are actively working on winning go games, so I put the example here). Having an AI beat us in one of these does require it to be in some important (to me) sense smarter than us, but not all attacks are in this bucket.
C)
[...] requires thinking differently from humans
I think AIs already today think differently than humans in any reasonable way we could mean that. In fact, if we could make an them NOT think differently than humans, my [untrustworthy] opinion is that this would be non-negligible progress towards solving alignment. No?
D)
The intelligence threshold for planning to take over the world isn’t low
First, disclaimers:
(1) I’m not an expert and this isn’t widely reviewed, (2) I’m intentionally being not detailed in order to not spread ideas on how to take over the world, I’m aware this is bad epistemic and I’m sorry for it, it’s the tradeoff I’m picking
So, mainly based on A, I think a person who is 90% as intelligent as Elon Musk in all dimensions would probably be able to destroy humanity, and so (if I’m right), the intelligence threshold is lower than “the world’s smartest human”. Again sorry for the lack of detail. [mods, if this was already too much, feel free to edit/delete my comment]
“Doing a Turing test” is a solution to something. What’s the problem you’re trying to solve?
As a judge, I’d ask the test subject to write me a rap song about turing tests. If it succeeds, I guess it’s a ChatGPT ;P
More seriously—it would be nice to find a judge that doesn’t know the capabilities and limitations of GPT models. Knowing those is very very useful
@habryka , Would you reply to this comment if there’s an opportunity to donate to either? Me and another person are interested, and others could follow this comment too if they wanted to
(only if it’s easy for you, I don’t want to add an annoying task to your plate)