Previously “Lanrian” on here. Research analyst at Redwood Research. Views are my own.
Feel free to DM me, email me at [my last name].[my first name]@gmail.com or send something anonymously to https://www.admonymous.co/lukas-finnveden
Previously “Lanrian” on here. Research analyst at Redwood Research. Views are my own.
Feel free to DM me, email me at [my last name].[my first name]@gmail.com or send something anonymously to https://www.admonymous.co/lukas-finnveden
I thought a potential issue with wild caught fish is that other consumers would simply substitute away from wild to farmed fish, since most people don’t care much and wild caught fish supply isn’t very elastic.
But anchovies and sardines (as suggested in the post) seem like they avoid that issue since apparently there’s basically no farming of them.
I also think it’s just super reasonable to eat animal products and offset with donations — which can easily net reduce animal suffering given how good donation opportunities there are.
IMO, a big appeal of controlled takeoff is that, if successful, it slows down all of takeoff.
Whereas a global shut down, that might have happened at a time before we had great automated alignment research, and that might incidentally ban a lot of safety research as well… might just end some number of years later, whereupon we might quickly go through the remainder of takeoff, and incur similarly much risk as without the shutdown.
(Things that can cause a shutdown to end: elections or deaths swap out who rules countries, geopolitical power shifts, verification becoming harder as it becomes more plausible that ppl could invest a lot to develop and hide compute and data centers where they can’t be seen, and maybe as AI software efficiency advances using smaller scale experiments that were hard to ban.)
Successful controlled takeoff definitely seems more likely to me than ”shutdown so long that intelligence augmented humans have time to grow up”, and also more likely than ”shutdown so long that we can solve superintelligence alignment up front without having very smart models to help us or to experiment with”.
Short shutdown to do some prep before controlled takeoff seems reasonable.
Edit: I guess technically, some very mildly intelligence augmented humans (via embryo selection) are already being born, and they have a decent chance to grow up before superintelligence even without shutdown. I was thinking about intelligence augmentation that was good enough to significantly reduce x-risk. (Though I’m not sure how long people expect that to take.)
Lots of plausible mechanisms by which something could be “a little off” suggested in this Rohin comment.
This is the most compelling version of “trapped priors” I’ve seen. I agreed with Anna’s comment on the original post, but the mechanisms here make sense to me as something that would mess a lot with updating. (Though it seems different enough from the very bayes-focused analysis in the original post that I’m not sure it’s referring to the same thing.)
I think that’s true in how they refer to it.
But it’s also a bit confusing, because I don’t think they have a definition of superintelligence in the book other than “exceeds every human at almost every mental task”, so AIs that are broadly moderately superhuman ought to count.
Edit: No wait, correction:
A few pages later they say:
> We will describe it using the term “superintelligence,” meaning a mind much more capable than any human at almost every sort of steering and prediction problem — at least, those problems where there is room to substantially improve over human performance.*
Hm, you seem more pessimistic than I feel about the situation. E.g. I would’ve bet that Where I agree and disagree with Eliezer added significant value and changed some minds. Maybe you disagree, maybe you just have a higher bar for “meaningful change”.
(Where, tbc, I think your opportunity cost is very high so you should have a high bar for spending significant time writing lesswrong content — but I’m interpreting your comments as being more pessimistic than just “not worth the opportunity cost”.)
This is roughly what seems to have happened in DC, where the internal influence approach was swept away by a big Overton window shift after ChatGPT.
In what sense was the internal influence approach “swept away”?
Also, it feels pretty salient to me that the ChatGPT shift was triggered by public, accessible empirical demonstrations of capabilities being high (and social impacts of that). So in my mind that provides evidence for “groups change their mind in response to certain kinds of empirical evidence” and doesn’t really provide evidence for “groups change their mind in response to a few brave people saying what they believe and changing the overton window”.
If the conversation changed a lot causally downstream of the CAIS extinction letter or FLI pause letter, that would be better evidence for your position (though also consistent with a model that put less weight on preference cascades and model the impact more like “policymakers weren’t aware that lots of experts were concerned, this letter communicated that experts were concerned”). I don’t know to what extent this was true. (Though I liked the CAIS extinction letter a lot and certainly believe it had a good amount of impact — I just don’t know how much.)
As such, I disagree with the various actions you recommend lab employees to take, and do not intend to take them myself.
It’s not clear that you disagree that much? You say you agree with leo’s statement, which seems to be getting lots of upvotes and “thanks” emojis suggesting that people are going “yes, this is great and what we asked for”.
I’m not sure what other actions there are to disagree with. There’s “advocate internally to ensure that the lab lets its employees speak out publicly, as mentioned above, without any official retaliation” — but I don’t really expect any official retaliation for statements like these so I don’t expect this to be a big fight where it’s costly to take a position.
I think the discussion wouldn’t have to be like “here’s a crazy plan”.
I think there could have been something more like: “Important fact to understand about the situation: Even if superintelligence comes within the next 10 years, it’s pretty likely that sub-ASI systems will have had a huge impact on the world by then — changing the world in a few-year period more than any technology ever has changed the world in a few-year period. It’s hard to predict what this would look like [easy calls, hard calls, etc]. Some possible implications could be: [long list: …, automated alignment research, AI-enabled coordination, people being a lot more awake to the risks of ASI, lots of people being in relationships with AIs and being supportive of AI rights, not-egregiously-misaligned AIs that are almost as good at bio/cyber/etc as the superintelligences...]. Some of these things could be helpful, some could be harmful. Through making us more uncertain about the situation, this lowers our confidence that everyone will die. In particular, some chance that X, Y, Z turns out really helpful. But obviously, if we see humanity as an agent, it would be a dumb plan for humanity to just assume that this crazy, hard-to-predict mess will save the whole situation.”
I.e. it could be presented as an important thing to understand about the strategic situation rather than as a proposed plan.
If I understand correctly, the paper suggests this limit—the maximum improvement in learning efficiency a recursively self-improving superintelligence could gain, beyond the efficiency of human brains—is “4-10 OOMs,” which it describes as equivalent to 4-10 “years of AI progress, at the rate of progress seen in recent years.”
Perhaps I’m missing something, and again I’m sorry if so, but after reading the paper carefully twice I don’t see any arguments that justify this choice of range. Why do you expect the limit of learning efficiency for a recursively self-improving superintelligence is 4-10 recent-progress-years above humans?
Oh there’s lots of arguments feeding into that range. Look at this part of the paper. There’s a long list of bullet points of different ways that superintelligences could be more efficient than humans. Each of the estimates have a range of X-Y OOMs. Then:
Overall, the additional learning efficiency gains from these sources suggest that effective limits are 4 − 12 OOMs above the human brain. The high end seems extremely high, and we think there’s some risk of double counting some of the gains here in the different buckets, so we will bring down our high end to 10 OOMs.
Here: 4 is supposed to be the product of all the lower numbers guessed-at above (“X” in “X-Y”), and 12 is supposed to be the product of all the upper numbers (“Y” in “X-Y”).
Interesting, thanks. I think I had heard the rumor before and believed it.
In the linked study, it looks like they asked the people about regret very shortly after the suicide attempt. This could both bias the results towards less regret to have survived (little time to change their mind) or more regret to have survived (people might be scared to signal intent to retry suicide, for fear of being committed, which I think sometimes happens soon after failed attempts).
I read the claim as saying that “some people and institutions concerned with AI safety” could have had more than order of magnitude more resources than they actually have, by investing. Not necessarily a claim about the aggregate where you combine their wealth with that of AI-safety sympathetic AI company founders and early employees. (Though maybe Carl believes the claim about the aggregate as well.)
Over the last 10 years, NVDA has had ~100x returns after dividing out the average S&P 500 gains. So more than an OOM of gains were possible just investing in public companies.
(Also, I think it’s slightly confusing to point to the fact that the current portfolio is so heavily weighted towards AI capability companies. That’s because the AI investments grew so much faster than everything else. It’s consistent with a small fraction of capital being in AI-related stuff 4-10y ago — which is the relevant question for determining how much larger gains were possible.)
My argument wouldn’t start from “it’s fully negligible”. (Though I do think it’s pretty negligible insofar as they’re investing in big hardware & energy companies, which is most of what’s visible from their public filings. Though private companies wouldn’t be visible on their public filings.) Rather, it would be a quantitative argument that the value from donation opportunities is substantially larger than the harms from investing.
One intuition pump that I find helpful here: Would I think it’d be a highly cost-effective donation opportunity to donate [however much $ Carl Shulman is making] to reduce investment in AI by [however much $ Carl Shulman is counterfactually causing]? Intuitively, that seems way less cost-effective than normal, marginal donation opportunities in AI safety.
You say “I think that accelerating capabilities buildouts to use your cut of the profits to fund safety research is a bit like an arsonist donating to the fire station”. I could say it’s more analogous to “someone who wants to increase fire safety invests in the fireworks industry to get excess returns that they can donate to the fire station, which they estimate will reduce far more fire than their fireworks investments caused”, which seems very reasonable to me. (I think the main difference is that a very small fraction of fires are caused by fireworks. An even better comparison might be for a climate change advocate to invest in fossil fuels when that appears to be extremely profitable.)
Insofar as your objection isn’t swayed by the straightforward quantiative consequentialist case, but is more deontological-ish in nature, I’d be curious if it ultimately backs out to something consequentialist-ish (maybe something about signaling to enable coordination around opposing AI?). Or if it’s more of a direct intuition.
My read is that the cooperation he is against is with the narrative that AI-risk is not that important (because it’s too far away or whatever). This indeed influences which sorts of agencies get funded, which is a key thing he is upset about here.
Hm, I still don’t really understand what it means to be [against cooperation with the narrative that AI risk is not that important]. Beyond just believing that AI risk is important and acting accordingly. (A position that seems easy to state explicitly.)
Also: The people whose work is being derided definitely don’t agree with the narrative that “AI risk is not that important”. (They are and were working full-time to reduce AI risk because they think it’s extremely important.) If the derisiveness is being read as a signal that “AI risk is important” is a point of contention, then the derisiveness is misinforming people. Or if the derisiveness was supposed to communicate especially strong disapproval of any (mistaken) views that would directionally suggest that AI risk is less important than the author thinks: then that would just seems like soldier mindset (more harshly critizing views that push in directions you don’t like, holding goodness-of-the-argument constant), which seems much more likely to muddy the epistemic waters than to send important signals.
Ok. If you think it’s correct for Eliezer to be derisive, because he’s communicating the valuable information that something shouldn’t be “cooperated with”, can you say more specifically what that means? “Not engage” was speculation on my part, because that seemed like a salient way to not be cooperative in an epistemic conflict.
Adele: “Long story short, derision is an important negative signal that something should not be cooperated with”
Lukas: “If that’s a claim that Eliezer wants to make (I’m not sure if it is!) I think he should make it explicitly and ideally argue for it.”
Habryka: “He has explicitly argued for it”
What version of the claim “something should not be cooperated with” is present + argued-for in that post? I thought that post was about the object level. (Which IMO seems like a better thing to argue about. I was just responding to Adele’s comment.)
If that’s a claim that Eliezer wants to make (I’m not sure if it is!) I think he should make it explicitly and ideally argue for it. Even just making it more explicit what the claim is would allow others to counter-argue the claim, rather than leaving it implicit and unargued.[1] I think it’s dangerous for people to defer to Eliezer about whether or not it’s worth engaging with people who disagree with him, which limits the usefulness of claims without arguments.
Also, aside on the general dynamics here. (Not commenting on Eliezer in particular.) You say “derision is an important negative signal that something should not be cooperated with”. That’s in the passive voice, more accurate would be “derision is an important negative signal where the speaker warns the listener to not cooperate with the target of derision”. That’s consistent with “the speaker cares about the listener and warns the listener that the target isn’t useful for the listener to cooperate with”. But it’s also consistent with e.g. “it would be in the speakers interest for the listener to not cooperate with the target, and the speaker is warning the listener that the speaker might deride/punish/exclude the listener if they cooperate with the target”. General derision mixes together all these signals, and some of them are decidedly anti-epistemic.
For example, if the claim is “these people aren’t worth engaging with”, I think there are pretty good counter-arguments even before you start digging into the object-level: The people having a track record of being willing to publicly engage on the topics of debate, of being willing to publicly change their mind, of being open enough to differing views to give MIRI millions of dollars back when MIRI was more cash-constrained than they are now, and understanding points that Eliezer think are important better than most people Eliezer actually spends time arguing with.
To be clear, I don’t particularly think that Eliezer does want to make this claim. It’s just one possible way that “don’t cooperate with” could cash out here, if your hypothesis is correct.
Thanks!
Thanks, I think I’m sympathetic to a good chunk of this (though I think I still put somewhat greater value on subjective credences than you do). In particular, I agree that there are lots of ways people can mess up when putting subjective credences on things, including “assuming they agree more than they do”.
I think the best solution to this is mostly to teach people about the ways that numbers can mislead, and how to avoid that, so that they can get the benefits of assigning numbers without getting the downside. (E.g.: complementing numerical forecasts with scenario forecasts. I’m a big fan of scenario forecasts.)
My impression is that Eliezer holds a much stronger position than yours. In the bit I quoted above, I think Eliezer isn’t only objecting to putting too much emphasis on subjective credences, but is objecting to putting subjective credences on things at all.
I’m somewhat sympathetic to this reasoning. But I think it proves too much.
For example: If you’re very hungry and walk past someone’s fruit tree, I think there’s a reasonable ethical case that it’s ok to take some fruit if you leave them some payment, if you’re justified in believing that they’d strongly prefer the payment to having the fruit. Even in cases where you shouldn’t have taken the fruit absent being able to repay them, and where you shouldn’t have paid them absent being able to take the fruit.
I think the reason for this is related to how it’s nice to have norms along the lines of “don’t leave people on-net worse-off” (and that such norms are way easier to enforce than e.g. “behave like an optimal utilitarian, harming people when optimal and benefitting people when optimal”). And then lots of people also have some internalized ethical intuitions or ethics-adjacent desires that work along similar lines.
And in the animal welfare case, instead of trying to avoid leaving a specific person worse-off, it’s about making a class of beings on-net better-off, or making a “cause area” on-net better-off. I have some ethical intuitions (or at least ethics-adjacent desires) along these lines and think it’s reasonable to indulge them.