Communications lead at MIRI. Unless otherwise indicated, my posts and comments here reflect my own views, and not necessarily my employer’s.
Rob Bensinger
This is formally correct.
(Though one of those updates might be a lot smaller than the other, if you’ve e.g. already thought about one of those topics a lot and reached a confident conclusion.)
Can I cross-post this to the EA Forum? (Or you can do it, if you prefer; but I think this is a really useful comment.)
(But insofar as you continue to be unsure about Ben, yes, you should be open to the possibility that Emerson has hidden information that justifies Emerson thinking Ben is being super dishonest. My confidence re “no hidden information like that” is downstream of my beliefs about Ben’s character.)
Why do you think that’s obvious?
I know Ben, I’ve conversed with him a number of times in the past and seen lots of his LW comments, and I have a very strong and confident sense of his priorities and values. I also read the post, which “shows its work” to such a degree that Ben would need to be unusually evil and deceptive in order for this post to be an act of deception.
I don’t have any private knowledge about Nonlinear or about Ben’s investigation, but I’m happy to vouch for Ben, such that if he turns out to have been lying, I ought to take a credibility hit too.
He’s just a guy who hasn’t been trained as an investigative journalist
If he were a random non-LW investigative journalist, I’d be a lot less confident in the post’s honesty.
Number of hours invested in research does not necessarily correlate with objectivity of research
“Number of hours invested” doesn’t prove Ben isn’t a lying sociopath (heck, if you think that you can just posit that he’s lying about the hours spent), but if he isn’t a lying sociopath, it’s strong evidence against negligence.
So, until we know a lot more about this case, I’ll withhold judgment about who might or might not be deliberately asserting falsehoods.
That’s totally fine, since as you say, you’d never heard of Ben until yesterday. (FWIW, I think he’s one of the best rationalists out there, and he’s a well-established Berkeley-rat community member who co-runs LessWrong and who tons of other veteran LWers can vouch for.)
My claim isn’t “Geoffrey should be confident that Ben is being honest” (that maybe depends on how much stock you put in my vouching and meta-vouching here), but rather:
I’m pretty sure Emerson doesn’t have strong reason to think Ben isn’t being honest here.
If Emerson lacks strong reason to think Ben is being dishonest, then he definitely shouldn’t have threatened to sue Ben.
E.g., I’m claiming here that you shouldn’t sue someone for libel if you feel highly uncertain about whether they’re being honest or dishonest. It’s ethically necessary (though IMO not sufficient) that you feel pretty sure the other person is being super dishonest. And I’d be very surprised if Emerson has rationally reached that epistemic state (because I know Ben, and I expect he conducted himself in his interactions with Nonlinear the same way he normally conducts himself).
Actually, I do know of an example of y’all offering money to someone for defending an org you disliked and were suspicious of. @habryka, did that money get accepted?
(The incentive effects are basically the same whether it was accepted or not, as long as it’s public knowledge that the money was offered; so it seems good to make this public if possible.)
Yeah, this post makes me wonder if there are non-abusive employers in EA who are nevertheless enabling abusers by normalizing behavior that makes abuse popular. Employers who pay their employees months late without clarity on why and what the plan is to get people paid eventually. Employers who employ people without writing things down, like how much people will get paid and when. Employers who try to enforce non-disclosure of work culture and pay.
Do any of those things happen much in EA? (I don’t think I’ve ever heard of an example of one of those things outside of Nonlinear, but maybe I’m out of the loop.)
As I think of it, the heart of the “bad argument gets counterargument” notion is “respond to arguments using reasoning, not coercion”, rather than “literal physical violence is a unique category of thing that is never OK”. Both strike me as good norms, but the former seems deeper and more novel to me, closer to the heart of things. I’m a fan of Scott’s gloss (and am happy to cite it instead, if we want to construe Eliezer’s version of the thing as something narrower):
[...] What is the “spirit of the First Amendment”? Eliezer Yudkowsky writes:
“There are a very few injunctions in the human art of rationality that have no ifs, ands, buts, or escape clauses. This is one of them. Bad argument gets counterargument. Does not get bullet. Never. Never ever never for ever.”
Why is this a rationality injunction instead of a legal injunction? Because the point is protecting “the marketplace of ideas” where arguments succeed based on the evidence supporting or opposing them and not based on the relative firepower of their proponents and detractors. [...]
What does “bullet” mean in the quote above? Are other projectiles covered? Arrows? Boulders launched from catapults? What about melee weapons like swords or maces? Where exactly do we draw the line for “inappropriate responses to an argument”?
A good response to an argument is one that addresses an idea; a bad argument is one that silences it. If you try to address an idea, your success depends on how good the idea is; if you try to silence it, your success depends on how powerful you are and how many pitchforks and torches you can provide on short notice.
Shooting bullets is a good way to silence an idea without addressing it. So is firing stones from catapults, or slicing people open with swords, or gathering a pitchfork-wielding mob.
But trying to get someone fired for holding an idea is also a way of silencing an idea without addressing it. I’m sick of talking about Phil Robertson, so let’s talk about the Alabama woman who was fired for having a Kerry-Edwards bumper sticker on her car (her boss supported Bush). Could be an easy way to quiet support for a candidate you don’t like. Oh, there are more Bush voters than Kerry voters in this county? Let’s bombard her workplace with letters until they fire her! Now she’s broke and has to sit at home trying to scrape money together to afford food and ruing the day she ever dared to challenge our prejudices! And the next person to disagree with the rest of us will think twice before opening their mouth!
The e-version of this practice is “doxxing”, where you hunt down an online commenter’s personally identifiable information including address. Then you either harass people they know personally, spam their place of employment with angry comments, or post it on the Internet for everyone to see, probably with a message like “I would never threaten this person at their home address myself, but if one of my followers wants to, I guess I can’t stop them.” This was the Jezebel strategy that Michael was most complaining about. Freethought Blogs is also particularly famous for this tactic and often devolves into sagas that would make MsScribe herself proud.
A lot of people would argue that doxxing holds people “accountable” for what they say online. But like most methods of silencing speech, its ability to punish people for saying the wrong things is entirely uncorrelated with whether the thing they said is actually wrong. It distributes power based on who controls the largest mob (hint: popular people) and who has the resources, job security, and physical security necessary to outlast a personal attack (hint: rich people). If you try to hold the Koch Brothers “accountable” for muddying the climate change waters, they will laugh in your face. If you try to hold closeted gay people “accountable” for promoting gay rights, it will be very easy and you will successfully ruin their lives. Do you really want to promote a policy that works this way?
There are even more subtle ways of silencing an idea than trying to get its proponents fired or real-life harassed. For example, you can always just harass them online. The stronger forms of this, like death threats and rape threats, are of course illegal. But that still leaves many opportunities for constant verbal abuse, crude sexual jokes, insults aimed at family members, and dozens of emails written in all capital letters about what sorts of colorful punishments you and the people close to you deserve. [...]
My answer to the “Doctrine Of The Preferred First Speaker” ought to be clear by now. The conflict isn’t always just between first speaker and second speaker, it can also be between someone who’s trying to debate versus someone who’s trying to silence. Telling a bounty hunter on the phone “I’ll pay you $10 million to kill Bob” is a form of speech, but its goal is to silence rather than to counterargue. So is commenting “YOU ARE A SLUT AND I HOPE YOUR FAMILY DIES” on a blog. And so is orchestrating a letter-writing campaign demanding a business fire someone who vocally supports John Kerry.
Bad argument gets counterargument. Does not get bullet. Does not get doxxing. Does not get harassment. Does not get fired from job. Gets counterargument. Should not be hard.
But NDAs are not a red-line for me personally
An NDA to keep the organization’s IP private seems fine to me; an NDA to prevent people from publicly criticizing their former workplace seems line-crossing to me.
Notably, one way to offset the reputational issue is to sometimes give people money for saying novel positive things about an org. The issue is less “people receive money for updating us” and more “people receive money only if they updated us in a certain direction”, or even worse “people receive money only if they updated us in a way that fits a specific narrative (e.g., This Org Is Culty And Abusive)”.
I dunno why cata posted it, but I almost quoted this myself to explain why I dislike the proposed “bad argument gets lawsuit” norm.
This also updates me about Kat’s take (as summarized by Ben Pace in the OP):
Kat doesn’t trust Alice to tell the truth, and that Alice has a history of “catastrophic misunderstandings”.
When I read the post, I didn’t see any particular reason for Kat to think this, and I worried it might be just be an attempt to dismiss a critic, given the aggressive way Nonlinear otherwise seems to have responded to criticisms.
With this new info, it now seems plausible to me that Kat was correct (even though I don’t think this justifies threatening Alice or Ben in the way Kat and Emerson did). And if Kat’s not correct, I still update that Kat was probably accurately stating her epistemic state, and that a lot of reasonable people might have reached the same epistemic state.
I think that there’s a big difference between telling everyone “I didn’t get the food I wanted, but they did get/offer to cook me vegan food, and I told them it was ok!” and “they refused to get me vegan food and I barely ate for 2 days”.
Agreed.
(Crossposted)
It also seems totally reasonable that no one at Nonlinear understood there was a problem. Alice’s language throughout emphasizes how she’ll be fine, it’s no big deal [...] I do not think that these exchanges depict the people at Nonlinear as being cruel, insane, or unusual as people.
100% agreed with this. The chat log paints a wildly different picture than what was included in Ben’s original post.
Given my experience with talking with people about strongly emotional events, I am inclined towards the interpretation where Alice remembers the 15th with acute distress and remembers it as ‘not getting her needs met despite trying quite hard to do so’, and the Nonlinear team remembers that they went out of their way that week to get Alice food—which is based on the logs from the 16th clearly true! But I don’t think I’d call Alice a liar based on reading this
Agreed. I did update toward “there’s likely a nontrivial amount of distortion in Alice’s retelling of other things”, and toward “normal human error and miscommunication played a larger role in some of the Bad Stuff that happened than I previously expected”. (Ben’s post was still a giant negative update for me about Nonlinear, but Kat’s comment is a smaller update in the opposite direction.)
Jim’s point here is compatible with “US libel laws are a force for good epistemics”, since a law can be aimed at lying+bullshitting and still disincentivize bad reasoning (to some degree) as a side-effect.
But I do think Jim’s point strongly suggests that we should have a norm against suing someone merely for reasoning poorly or getting the wrong answer. That would be moving from “lawsuits are good for norm enforcement” to “frivolous lawsuits are good for norm enforcement”, which is way less plausible.
Without making any comment about the accuracy or inaccuracy of this post, I would just point out that nobody in EA should be shocked that an organization (e.g. Nonlinear) that is being libeled (in its view) would threaten a libel suit to deter the false accusations (as they see them), to nudge the author(e.g. Ben Pace) towards making sure that their negative claims are factually correct and contextually fair.
Wikipedia claims: “The 1964 case New York Times Co. v. Sullivan, however, radically changed the nature of libel law in the United States by establishing that public officials could win a suit for libel only when they could prove the media outlet in question knew either that the information was wholly and patently false or that it was published ‘with reckless disregard of whether it was false or not’.”
Spartz isn’t a “public official”, so maybe the standard is laxer here?
If not, then it seems clear to me that Spartz wouldn’t win in a fair trial, because whether or not Ben got tricked by Alice/Chloe and accidentally signal-boosted others’ lies, it’s very obvious that Ben is neither deliberately asserting falsehoods, nor publishing “with reckless disregard”.
(Ben says he spent “100-200 hours” researching this post, which is way beyond the level of thoroughness we should require for criticizing an organization on LessWrong or the EA Forum!)
I think there should be a strong norm against threatening people with libel merely for saying a falsehood; the standard should at minimum be that you have good reason to think the person is deliberately lying or bullshitting.
(I think the standard should be way higher than that, too, given the chilling effect of litigiousness; but I won’t argue that here.)
My own suggestion would be to use a variety of different phrasings here, including both “capabilities” and “intelligence”, and also “cognitive ability”, “general problem-solving ability”, “ability to reason about the world”, “planning and inference abilities”, etc. Using different phrases encourages people to think about the substance behind the terminology—e.g., they’re more likely to notice their confusion if the stuff you’re saying makes sense to them under one of the phrasings you’re using, but doesn’t make sense to them under another of the phrasings.
Phrases like “cognitive ability” are pretty important, I think, because they make it clearer why these different “capabilities” often go hand-in-hand. It also clarifies that the central problems are related to minds / intelligence / cognition / etc., not (for example) the strength of robotic arm, even though that too is a “capability”.
Does “par-human reasoning” mean at the level of an individual human or at the level of all of humanity combined?
If it’s the former, what human should we compare it against? 50th percentile? 99.999th percentile?
I partly answered that here, and I’ll edit some of this into the post:
By ‘matching smart human performance… across all the scientific work humans do in that field’ I don’t mean to require that there literally be nothing humans can do that the AI can’t match. I do expect this kind of AI to quickly (or immediately) blow humans out of the water, but the threshold I have in mind is more like:
STEM-level AGI is AI that’s at least as scientifically productive as a human scientist who makes a variety of novel, original contributions to a hard-science field that requires understanding the physical world well. E.g., it can go toe-to-toe with highly productive human scientists on applying its abstract theories to real-world phenomena, using scientific ideas to design new tech, designing physical experiments, operating equipment, and generating new ideas that turn out to be true and that importantly advance the frontiers of our knowledge.
The way I’m thinking about the threshold, AI doesn’t have to be Nobel-prize-level, but it has to be “fully doing science”. I’d also be happy with a definition like ‘AI that can reason about the physical world in general’, but I think that emphasizing hard-science tasks makes it clearer why I’m not thinking of GPT-4 as ‘reasoning about the physical world in general’ in the relevant sense.
I’m not sure what the right percentile to target here is—maybe we should be looking at the top 5% of Americans with STEM PhDs? Where Americans with STEM PhDs maybe are at the top 1% of STEM ability for Americans?
What is the “basic mental machinery” required to do par-human reasoning? What if a system has the basic mental machinery but not the more advanced mental machinery?
Do you want this to include the robotic capabilities to run experiments and use physical tools? If not, why not (that seems important to me, but maybe you disagree)?
I want it to include the ability to run experiments and use physical tools.
I don’t know what the “basic mental machinery” required is—I think GPT-4 is missing some of the basic cognitive machinery top human scientists use to advance the frontiers of knowledge (as opposed to GPT-4 doing all the same mental operations as a top scientist but slower, or something), but this is based on a gestalt impression from looking at how different their outputs are in many domains, not based on a detailed or precise model of how general intelligence works.
One way of thinking about the relevant threshold is: if you gave a million chimpanzees billions of years to try to build a superintelligence, I think they’d fail, unless maybe you let them reproduce and applied selection pressure to them to change their minds. (But the latter isn’t something the chimps themselves realize is a good idea.)
In contrast, top human scientists pass the threshold ‘give us enough time, and we’ll be able to build a superintelligence’.
If an AI system, given enough time and empirical data and infrastructure, would eventually build a superintelligence, then I’m mostly happy to treat that as “STEM-level AGI”. This isn’t a necessary condition, and it’s presumably not strictly sufficient (since in principle it should be possible to build a very narrow and dumb meta-learning system that also bootstraps in this way eventually), but it maybe does a better job of gesturing at where I’m drawing a line between “GPT-4″ and “systems in a truly dangerous capability range”.
(Though my reason for thinking systems in that capability range are dangerous isn’t centered on “they can deliberately bootstrap to superintelligence eventually”. It’s far broader points like “if they can do that, they can probably do an enormous variety of other STEM tasks” and “falling exactly in the human capability range, and staying there, seems unlikely”.)
Does a human count as a STEM-level NGI (natural general intelligence)?
I tend to think of us that way, since top human scientists aren’t a separate species from average humans, so it would be hard for them to be born with complicated “basic mental machinery” that isn’t widespread among humans. (Though local mutations can subtract complex machinery from a subset of humans in one generation, even if it can’t add complex machinery to a subset of humans in one generation.)
Regardless, given how I defined the term, at least some humans are STEM-level.
If so, doesn’t that imply that we should already be able to perform pivotal acts? You said: “If it makes sense to try to build STEM-level AGI at all in that situation, then the obvious thing to do with your STEM-level AGI is to try to leverage its capabilities to prevent other AGIs from destroying the world (a “pivotal act”).”
The weakest STEM-level AGIs couldn’t do a pivotal act; the reason I think you can do a pivotal act within a few years of inventing STEM-level AGI is that I think you can quickly get to far more powerful systems than “the weakest possible STEM-level AGIs”.
The kinds of pivotal act I’m thinking about often involve Drexler-style feats, so one way of answering “why can’t humans already do pivotal acts?” might be to answer “why can’t humans just build nanotechnology without AGI?”. I’d say we can, and I think we should divert a lot of resources into trying to do so; but my guess is that we’ll destroy ourselves with misaligned AGI before we have time to reach nanotechnology “the hard way”, so I currently have at least somewhat more hope in leveraging powerful future AI to achieve nanotech.
(The OP doesn’t really talk about this, because the focus is ‘is p(doom) high?’ rather than ‘what are the most plausible paths to us saving ourselves?’.)
In an unpublished 2017 draft, a MIRI researcher and I put together some ass numbers regarding how hard (wet, par-biology) nanotech looked to us:
We believe that the bottlenecks on current progress toward par-biology nanotechnology are (a) figuring out how to put all of the puzzle pieces together correctly, (b) executing certain difficult computations required for determining how to build materials, and (c) engineering certain basic tools that will allow us to engineer better tools, where there are likely to be mutual dependencies between progress on these fronts. If the world’s top scientific and engineering talent were actively focusing on this application and were inspired to solve the key technical problems, we would expect it to be possible to push past these bottlenecks with no more than 10x the compute that Google spent on research projects in 2016.
Assuming no advances in AI algorithms over the state of the art in 2017, we would assign a 50% probability to fifty copies of John von Neumann, divided into five teams and supplied with a large number of lab technicians and other support staff, being able to achieve nanotechnology within 25 calendar years at a level that would be sufficient for a decisive advantage if the technology were available to a group in 2017.
(footnote: We stipulate “in 2017” because we would not necessarily expect par-biology nanotechnology to confer a decisive advantage in a world where nanotechnology had been gradually advanced to that level by human engineers over multiple decades; in that scenario, factors such as leaks, regulations, and competition from other developers would make it harder for one group to strongly pull ahead. We would expect it to be much easier for one group to strongly pull ahead if nanotechnology advances too quickly for leaks, regulations, and competition to be significant factors on the relevant timescale, as we believe is possible using AGI.)
Translating this into a more realistic scenario: we would assign a 40% probability to an organization with a $10 billion budget and the involvement of someone who can attract top researchers and leadership (e.g., Elon Musk) being able to reach this level of technological capability within 25 years, absent AI advances. Our probability would lower to 15% if there were only 10 calendar years available to the hypothetical Musk project instead of 25, and would rise to 85% if there were 50 calendar years and $20 billion available instead of 25 calendar years and $10 billion, holding these conditions stable and assuming no other large global disruptions.
As in §1.3, the predictions here are rough and intuitive, and were not generated by a formal model. It would be difficult for our probability to rise much higher than 85% given additional time or other resources. Our inside-view evaluation of the arguments assigns high probability to par-biology nanotechnology being achievable in fifty years under these idealized conditions, such that the remaining uncertainty in our informal aggregate models largely stems from model uncertainty and deference to experts who disagree with our view and consider par-biology nanotechnology much more difficult. We would be very surprised to learn that par-biology nanotechnology were much more difficult (say, requiring more than 500 VNG research years), and this would have a fairly large impact on our overall expectations about early AGI systems’ potential uses and impact.
(500 VNG research years = 500 von-Neumann-group research year, defined as ‘how much progress ten copies of John von Neumann would make if they worked together on the problem, hard, for 500 serial years’.)
This is also why I think humanity should probably put lots of resources into whole-brain emulation: I don’t think you need qualitatively superhuman cognition in order to get to nanotech, I think we’re just short on time given how slowly whole-brain emulation has advanced thus far.
With STEM-level AGI I think we’ll have more than enough cognition to do basically whatever we can align; but given how tenuous humanity’s grasp on alignment is today, it would be prudent to at least take a stab at a “straight to whole-brain emulation” Manhattan Project. I don’t think humanity as it exists today has the tech capabilities to hit the pause button on ML progress indefinitely, but I think we could readily do that with “run a thousand copies of your top researchers at 1000x speed” tech.
(Note that having dramatically improved hardware to run a lot of ems very fast is crucial here. This is another reason the straight-to-WBE path doesn’t look hopeful at a glance, and seems more like a desperation move to me; but maybe there’s a way to do it.)
Steering towards world states, taken literally, for a realistic agent is impossible, because an embedded agent cannot even contain a representation of a detailed world-state.
I’m not imagining AI steering toward a full specification of a physical universe; I’m imagining it steering toward a set of possible worlds. Sets of possible worlds can often be fully understood by reasoners, because you don’t need to model every world in the set in perfect detail in order to understand the set; you just need to understand at least one high-level criterion (or set of criteria) that determines which worlds go in the set vs. not in the set.
E.g., consider the preference ordering “the universe is optimal if there’s an odd number of promethium atoms within 100 light years of the Milky Way Galaxy’s center of gravity, pessimal otherwise”. Understanding this preference just requires understanding terms like “odd” and “promethium” and “light year”; it doesn’t require modeling full universes or galaxies in perfect detail.
Similarly, “maximize the amount of diamond that exists in my future light cone” just requires you to understand what “diamond” is and what “the more X you have, the better” means. It doesn’t require you to fully represent every universe in your head in advance.
(Note that selecting the maximizing action is computationally intractable; but you can have a maximizing goal even if you aren’t perfectly succeeding in the goal.)
The definition I give in the post is “AI that has the basic mental machinery required to do par-human reasoning about all the hard sciences”. In footnote 3, I suggest the alternative definition “AI that can match smart human performance in a specific hard science field, across all the scientific work humans do in that field”.
By ‘matching smart human performance… across all the scientific work humans do in that field’ I don’t mean to require that there literally be nothing humans can do that the AI can’t match. I do expect this kind of AI to quickly (or immediately) blow humans out of the water, but the threshold I have in mind is more like:
STEM-level AGI is AI that’s at least as scientifically productive as a human scientist who makes a variety of novel, original contributions to a hard-science field that requires understanding the physical world well. E.g., it can go toe-to-toe with highly productive human scientists on applying its abstract theories to real-world phenomena, using scientific ideas to design new tech, designing physical experiments, operating equipment, and generating new ideas that turn out to be true and that importantly advance the frontiers of our knowledge.
The way I’m thinking about the threshold, AI doesn’t have to be Nobel-prize-level, but it has to be “fully doing science”. I’d also be happy with a definition like ‘AI that can reason about the physical world in general’, but I think that emphasizing hard-science tasks makes it clearer why I’m not thinking of GPT-4 as ‘reasoning about the physical world in general’ in the relevant sense.
- 9 May 2023 13:00 UTC; 4 points) 's comment on An artificially structured argument for expecting AGI ruin by (
I was going to ask if I could!
I understand if people don’t want to talk about it, but I do feel sad that there isn’t some kind of public accounting of what happened there.
(Well, I don’t concretely understand why people don’t want to talk about it, but I can think of possibilities!)