I would give my dog many treats to stop eating deer poop, since this behavior can lead to expensive veterinary visits. But I can’t communicate with my dog well enough to set up this trade.
Why isn’t this an example of “we would trade with animals if we could communicate better”?
JakubK
To be clear, I haven’t seen many designs that people I respect believed to have a chance of actually working. If you work on the alignment problem or at an AI lab and haven’t read Nate Soares’ On how various plans miss the hard bits of the alignment challenge, I’d suggest reading it.
Can you explain your definition of the sharp left turn and why it will cause many plans to fail?
I asked GPT-4 to summarize the article and then come up with some alternative terms, here are a few I like:
One-way summary
Insider mnemonic
Contextual shorthand
Familiarity trigger
Conceptual hint
Clue for the familiar
Knowledge spark
Abbreviated insight
Expert’s echo
Breadcrumb for the well-versed
Whisper of the well-acquainted
Insider’s underexplained aphorism
I also asked for some idioms. “Seeing the forest but not the trees” seems apt.
“My guess is that people who are concluding P(Doom) is high will each need to figure out how to live with it for themselves.”
The following perspective helps me feel better.
First, it’s not news that AGI poses a significant threat to humanity. I felt seriously worried when I first encountered this idea in 2018 listening to Eliezer on the Sam Harris podcast. The “Death With Dignity” post revived these old fears, but it didn’t reveal new dangers that were previously unknown to me.
Second, many humans have dealt with believing “P(impending Doom) = high” at many times throughout history. COVID, Ukraine, famine in Yemen, WWI, WWII, the Holocaust, 9/11, incarceration in the US, Mongol conquests, the Congo Free State’s atrocities, the Great Purge, the Cambodian genocide, the Great Depression, the Black Death, the Putumayo genocide, the Holodomor, the Trail of Tears, the Syrian civil war, the Irish Potato Famine, the Vietnam War, slavery, colonialism, more wars, more terrorist attacks, more plagues, more famines, more genocides, etc.
It’s easy to read these events without simulating how the people involved felt. In most of these cases, “Doom” didn’t mean “everyone on the planet will die” but rather “I will lose everything” or “everything that matters will be destroyed” or “everyone I know will die” or “my culture will die” or “my family will die” or simply “I will die.” The thoughts these people had probably felt considerably more devastating than the thoughts I’m having these days. Heck, some people get terrified just reading the news. I’m not alone in worrying about the future.
Again: “I’m not alone in worrying about the future.” I find this immensely comforting. People are not at all oblivious to the world having problems, even if they disagree with me on which problems are the most important. Everyone has fears.- 4 Nov 2022 10:10 UTC; 2 points) 's comment on All AGI Safety questions welcome (especially basic ones) [~monthly thread] by (
I didn’t read it as an argument so much as an emotionally compelling anecdote that excellently conveys this realization:
I had had the upper hand for so long that it became second nature, and then suddenly, I went to losing every game.
Relevant tweet/quote from Mustafa Suleyman, the co-founder and CEO:
Powerful AI systems are inevitable. Strict licensing and regulation is also inevitable. The key thing from here is getting the safest and most widely beneficial versions of both.
An AI Policy Tool for Today: Ambitiously Invest in NIST (Anthropic 2023)
National Security Addition to the NIST AI RMF (Special Competitive Studies Project 2023)
Existential risk and rapid technological change—a thematic study for UNDRR (Stauffer et al. 2023), especially section 4.3 (“30 actions to reduce existential risk”)
Crafting Legislation to Prevent AI-Based Extinction: Submission of Evidence to the Science and Technology Select Committee’s Inquiry on the Governance of AI (Cohen and Osborne 2023)
Why we need a new agency to regulate advanced artificial intelligence: Lessons on AI control from the Facebook Files (Korinek 2021)
Is this still happening? The website has stopped working for me.
I greatly appreciate this post. I feel like “argh yeah it’s really hard to guarantee that actions won’t have huge negative consequences, and plenty of popular actions might actually be really bad, and the road to hell is paved with good intentions.” With that being said, I have some comments to consider.
The offices cost $70k/month on rent [1], and around $35k/month on food and drink, and ~$5k/month on contractor time for the office. It also costs core Lightcone staff time which I’d guess at around $75k/year.
That is ~$185k/month and ~$2.22m/year. I wonder if the cost has anything to do with the decision? There may be a tendency to say “an action is either extremely good or extremely bad because it either reduces x-risk or increases x-risk, so if I think it’s net positive I should be willing to spend huge amounts of money.” I think this framing neglects a middle ground of “an action could be somewhere in between extremely good and extremely bad.” Perhaps the net effects of the offices were “somewhat good, but not enough to justify the monetary cost.” I guess Ben sort of covers this point later (“Having two locations comes with a large cost”).
its value was substantially dependent on the existing EA/AI Alignment/Rationality ecosystem being roughly on track to solve the world’s most important problems, and that while there are issues, pouring gas into this existing engine, and ironing out its bugs and problems, is one of the most valuable things to do in the world.
Huh, it might be misleading to view the offices as “pouring gas into the engine of the entire EA/AI Alignment/Rationality ecosystem.” They contribute to some areas much more than others. Even if one thinks that the overall ecosystem is net harmful, there could still be ecosystem-building projects that are net helpful. It seems highly unlikely to me that all ecosystem-building projects are bad.
The Lighthouse system is going away when the leases end. Lighthouse 1 has closed, and Lighthouse 2 will continue to be open for a few more months.
These are group houses for members of the EA/AI Alignment/Rationality ecosystem, correct? Relating to the last point, I expect the effects of these to be quite different from the effects of the offices.
FTX is the obvious way in which current community-building can be bad, though in my model of the world FTX, while somewhat of outlier in scope, doesn’t feel like a particularly huge outlier in terms of the underlying generators.
I’m very unsure about this, because it seems plausible that SBF would have done something terrible without EA encouragement. Also, I’m confused about the detailed cause-and-effect analysis of how the offices will contribute to SBF-style catastrophes—is the idea that “people will talk in the offices and then get stupid ideas, and they won’t get equally stupid ideas without the offices?”
My guess is RLHF research has been pushing on a commercialization bottleneck and had a pretty large counterfactual effect on AI investment, causing a huge uptick in investment into AI and potentially an arms race between Microsoft and Google towards AGI: https://www.lesswrong.com/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research?commentId=HHBFYow2gCB3qjk2i
Worth noting that there is plenty of room for debate on the impacts of RLHF, including the discussion in the linked post.
Tendencies towards pretty mindkilly PR-stuff in the EA community: https://forum.effectivealtruism.org/posts/ALzE9JixLLEexTKSq/cea-statement-on-nick-bostrom-s-email?commentId=vYbburTEchHZv7mn4
Overall I’m getting a sense of “look, there are bad things happening so the whole system must be bad.” Additionally, I think the negative impact of “mindkilly PR-stuff” is pretty insubstantial. On a related note, I somewhat agree with the idea that “most successful human ventures look—from up close—like dumpster fires.” It’s worth being wary of inferences resembling “X evokes a sense of disgust, so X is probably really harmful.”
I genuinely only have marginally better ability to distinguish the moral character of Anthropic’s leadership from the moral character of FTX’s leadership
Yeah this makes sense. I would really love to gain a clear understanding of who has power at the top AGI labs and what their views are on AGI risk. AFAIK nobody has done a detailed analysis of this?
Also, as in the case of RLHF, it’s worth noting that there are some reasonable arguments for Anthropic being helpful.
I think AI Alignment ideas/the EA community/the rationality community played a pretty substantial role in the founding of the three leading AGI labs (Deepmind, OpenAI, Anthropic)
Definitely true for Anthropic. For OpenAI I’m less sure; IIRC the argument is that there were lots of EA-related conferences that contributed to the formation of OpenAI, and I’d like to see more details than this; “there were EA events where key players talked” feels quite different from “without EA, OpenAI would not exist.” I feel similarly about DeepMind; IIRC Eliezer accidentally convinced one of the founders to work on AGI—are there other arguments?
And again, how do the Lightcone offices specifically contribute to the founding of more leading AGI labs? My impression is that the offices’ vibe conveyed a strong sense of “it’s bad to shorten timelines.”
It’s a bad idea to train models directly on the internet
I’m confused how the offices contribute to this.
The EA and AI Alignment community should probably try to delay AI development somehow, and this will likely include getting into conflict with a bunch of AI capabilities organizations, but it’s worth the cost
Again, I’m confused how the offices have a negative impact from this perspective. I feel this way about quite a few of the points in the list.
I do sure feel like a lot of AI alignment research is very suspiciously indistinguishable from capabilities research
...
It also appears that people who are concerned about AGI risk have been responsible for a very substantial fraction of progress towards AGI
...
A lot of people in AI Alignment I’ve talked to have found it pretty hard to have clear thoughts in the current social environment
To me these seem like some of the best reasons (among those in the list; I think Ben provides some more) to shut down the offices. The disadvantage of the list format is that it makes all the points seem equally important; it might be good to bold the points you see as most important or provide a numerical estimate for what percentage of the negative expect impact comes from each point.
The moral maze nature of the EA/longtermist ecosystem has increased substantially over the last two years, and the simulacra level of its discourse has notably risen too.
I feel similar to the way I felt about the “mindkilly PR-stuff”; I don’t think the negative impact is very high in magnitude.
the primary person taking orders of magnitudes more funding and staff talent (Dario Amodei) has barely explicated his views on the topic and appears (from a distance) to have disastrously optimistic views about how easy alignment will be and how important it is to stay competitive with state of the art models
Agreed. I’m confused about Dario’s views.
I recall at EAG in Oxford a year or two ago, people were encouraged to “list their areas of expertise” on their profile, and one person who works in this ecosystem listed (amongst many things) “Biorisk” even though I knew the person had only been part of this ecosystem for <1 year and their background was in a different field.
This seems very trivial to me. IIRC the Swapcard app just says “list your areas of expertise” or something, with very little details about what qualifies as expertise. Some people might interpret this as “list the things you’re currently working on.”
It also seems to me like people who show any intelligent thought or get any respect in the alignment field quickly get elevated to “great researchers that new people should learn from” even though I think that there’s less than a dozen people who’ve produced really great work
Could you please list the people who’ve produced really great work?
I similarly feel pretty worried by how (quite earnest) EAs describe people or projects as “high impact” when I’m pretty sure that if they reflected on their beliefs, they honestly wouldn’t know the sign of the person or project they were talking about, or estimate it as close-to-zero.
Strongly agree. Relatedly, I’m concerned that people might be exhibiting a lot of action bias.
Last point, unrelated to the quote: it feels like this post is entirely focused on the possible negative impacts of the offices, and that kind of analysis seems very likely to arrive at incorrect conclusions since it fails to consider the possible positive impacts. Granted, this post was a scattered collection of Slack messages, so I assume the Lightcone team has done more formal analyses internally.
Agreed. Stuart was more open to the possibility that current techniques are enough.
I worry that people will skip the post, read this comment, and misunderstand the post, so I want to point out how this comment might be misleading, even though it’s a great comment.
None of the interventions in the post are “go work at OpenAI to change things from the inside.” And only the outreach ones sound anything like “going around and convincing others.” And there’s a disclaimer that these interventions have serious downside risks and require extremely competent execution.
EDIT: one idea in the 2nd post is to join safety and governance teams at top labs like OpenAI. This seems reasonable to me? (“Go work on capabilities at OpenAI to change things” would sound unreasonable.)
On the other hand, in your view all deep learning progress has been empirical, often via dumb hacks and intuitions (this isn’t true imo).
Can you elaborate on why you think this is false? I’m curious.
On a related note, this part might be misleading:
I’m just really, really skeptical that a bunch of abstract work on decision theory and similar [from MIRI and similar independent researchers] will get us there. My expectation is that alignment is an ML problem, and you can’t solve alignment utterly disconnected from actual ML systems.
I think earlier forms of this research focused on developing new, alignable algorithms, rather than aligning existing deep learning algorithms. However, a reader of the first quote might think “wow, those people actually thought galaxy-brained decision theory stuff was going to work on deep learning systems!”
For more details, see Paul Christiano’s 2019 talk on “Current work in AI alignment”:
So for example, I might have a view like: we could either build AI by having systems which perform inference and models that we understand that have like interpretable beliefs about the world and then act on those beliefs, or I could build systems by having opaque black boxes and doing optimization over those black boxes. I might believe that the first kind of AI is easier to align, so one way that I could make the alignment tax smaller is just by advancing that kind of AI, which I expect to be easier to align.
This is not a super uncommon view amongst academics. It also may be familiar here because I would say it describes MIRI’s view; they sort of take the outlook that some kinds of AI just look hard to align. We want to build an understanding such that we can build the kind of AI that is easier to align.
What is it to write well?
Have a thought.
Get that thought from your head to someone else’s.
I came to the same conclusion before reading this post and even phrased it similarly in my head. I strongly agree.
I’ve heard other things are good, e.g. Pinker’s stuff
Can you be more specific? I doubt you’re talking about Enlightenment Now, for instance.
There is this thing where, like, we like to use little words because it seems like it helps us think clearly. This is not always, like, awful, but I’ve started to feel like it’s a bit overused.
I don’t understand what mistake you’re pointing to here.
read those authors you’d like to write like
Do you have any recommendations?
I don’t have a specific decision that I want OpenAI to make right now (well I do, but I don’t think they’d become closedAI).
Does “closedAI” mean “OpenAI shuts down” or “OpenAI stops making their models available to the public” (I’m not sure how they could do this one while earning money) or “OpenAI stops publishing papers describing their model architectures, training datasets, hyperparameters, etc” or something else?
For the audience: one of the first “successes” of convincing a high-profile person of the importance of AI X-risk was Elon Musk.
Seems cherrypicked. Does Dustin Moskovitz provide a point in the other direction? I don’t know the stories of how these people started taking AI risk seriously and would like more details. Also, Elon became “convinced” (although it doesn’t seem like he’s convinced in the same way Alignment Forum users are) as early as 2014, and the evidence for AI x-risk looks a lot different today in 2023.
I am more excited about interventions that focus on AGI labs. They are already barreling towards AGI, and it seems like them slowing down or coordinating with each other could be really useful.
Seems plausible to me that within the next few years some other AI companies could overtake OpenAI + DeepMind in the race to AGI. What reasons are there to expect current leaders to maintain their lead?
The most useful / potentially-behavior-changing part of the post for me is the section describing how certain groups shouldn’t develop detailed models of AI risk (pasted below). But the arguments are light on details. I’d like to see a second post building a more detailed model of why you think these outcomes are net negative.
The specific outcomes we want to avoid are:
The higher echelons of some government or military develop an accurate model of AI Risk.
They’d want to enforce their government’s superiority, or national superiority, or ideological superiority, and they’d trample over the rest of humanity.
There are no eudaimonia-interested governments on Earth.
The accurate model of AI Risk makes its way into the public consciousness.
The “general public”, as I’ve outlined, is not safe either. And in particular, what we don’t want is some “transparency policy” where the AGI-deploying group is catering to the public’s whims regarding the AGI’s preferences.
Just look at modern laws, and the preferences they imply! Humanity-in-aggregate is not eudaimonia-aligned either.
A large subset of wealthy or influential people not pre-selected by their interest in EA/LW ideas form an accurate model of AI Risk.
We’d either get some revenue-maximizer for a given corporation, or a dystopian dictatorship, or some such outcome.
And even if the particular influential person is conventionally nice, we get all the problems with sampling a random nice individual from the general population (the off-distribution problem).
Some comments:
A large amount of the public thinks AGI is near.
This links to a poll of Lex Fridman’s Twitter followers, which doesn’t seem like a representative sample of the US population.
they jointly support a greater than 10% likelihood that we will develop broadly human-level AI systems within the next decade.
Is this what you’re arguing for when you say “short AI timelines”? I think that’s a fairly common view among people who think about AI timelines.
AI is starting to be used to accelerate AI research.
My sense is that Copilot is by far the most important example here.
I imagine visiting alien civilizations much like earth, and I try to reason from just one piece of evidence at a time about how long that planet has.
I find this part really confusing. Is “much like earth” supposed to mean “basically the same as earth”? In that case, why not just present each piece of evidence normally, without setting up an “alien civilization” hypothetical? For example, the “sparks of AGI” paper provides very little evidence for short timelines on its own, because all we know is the capabilities of a particular system, not how long it took to get that point and whether that progress might continue.
The first two graphs show the overall number of college degrees and the number of STEM degrees conferred from 2011 to 2021
Per year, or cumulative? Seems like it’s per year.
If you think one should put less than 20% of their timeline thinking weight on recent progress
Can you clarify what you mean by this?
Overall, I think this post provides evidence that short AI timelines are possible, but doesn’t provide strong evidence that short AI timelines are probable. Here are some posts that provide more arguments for the latter point:
I pasted the YouTube video link into AssemblyAI’s Playground (which I think uses Conformer-1 for speech to text) and generated a transcript, available at this link. However, the transcript lacks labels for who is speaking.
If the ants accept the trade “leave and I’ll spare you,” I don’t have to spend any money on actually poisoning the ants. But I would consider the counteroffer “if you kill us, it will cost $20, and we’re willing to leave for $1.”
At the time of me writing, this comment is still the most recommended comment with 910 recommendations. 2nd place has 877 recommendations:
3rd place has 790 recommendations:
4th place has 682 recommendations:
After that, 5th place has 529, 6th place has 390, and the rest have 350 or fewer.
My thoughts:
2nd place reminds me of Let’s think about slowing down AI. But I somewhat disagree with the comment, because I do sense that many people have a desire for cool new AI tech.
3rd place sounds silly since advanced AI could help with reducing climate change, poverty, and genetic disorders. I also wonder if this commenter knows about AlphaFold.
4th place seems important. But I think that even if AGI jobs offered lower compensation, there would still be a considerable number of workers interested in pursuing them.