Adam Scholl

Karma: 2,363

Adam Scholl 27 Aug 2022 15:54 UTC
177 points
107
on: Common misconceptions about OpenAI
One comment in this thread compares the OP to Philip Morris’ claims to be working toward a “smoke-free future.” I think this analogy is overstated, in that I expect Philip Morris is being more intentionally deceptive than Jacob Hilton here. But I quite liked the comment anyway, because I share the sense that (regardless of Jacob’s intention) the OP has an effect much like safetywashing, and I think the exaggerated satire helps make that easier to see.
The OP is framed as addressing common misconceptions about OpenAI, of which it lists five:
1. OpenAI is not working on scalable alignment.
2. Most people who were working on alignment at OpenAI left for Anthropic.
3. OpenAI is a purely for-profit organization.
4. OpenAI is not aware of the risks of race dynamics.
5. OpenAI leadership is dismissive of existential risk from AI.
Of these, I think 1, 3, and 4 address positions that are held by basically no one. So by “debunking” much dumber versions of the claims people actually make, the post gives the impression of engaging with criticism, without actually meaningfully doing that. 2 at least addresses a real argument, but at least as I understand it, is quite misleading—while technically true, it seriously underplays the degree to which there was an exodus of key safety-conscious staff, who left because they felt OpenAI leadership was too reckless. So of these, only 5 strikes me as responding non-misleadingly to a real criticism people actually regularly make.
In response to the Philip Morris analogy, Jacob advised caution:
rhetoric like this seems like an excellent way to discourage OpenAI employees from ever engaging with the alignment community.
For many years, the criticism I heard of OpenAI in private was dramatically more vociferous than what I heard in public. I think much of this was because many people shared Jacob’s concern—if we say what we actually think about their strategy, maybe they’ll write us off as enemies, and not listen later when it really counts?
But I think this is starting to change. I’ve seen a lot more public criticism lately, which I think is probably at least in part because it’s become so obvious that the strategy of mincing our words hasn’t worked. If they mostly ignore all but the very most optimistic alignment researchers now, why should we expect that will change later, as long as we keep being careful to avoid stating any of our offensive-sounding beliefs?
From talking with early employees and others, my impression is that OpenAI’s founding was incredibly reckless, in the sense that they rushed to deploy their org, before first taking much time to figure out how to ensure that went well. The founders’ early comments about accident risk mostly strike me as so naive and unwise, that I find it hard to imagine they thought much at all about the existing alignment literature before deciding to charge ahead and create a new lab. Their initial plan—the one still baked into their name—would have been terribly dangerous if implemented, for reasons I’d think should have been immediately obvious to them had they stopped to think hard about accident risk at all.
And I think their actions since then have mostly been similarly reckless. When they got the scaling laws result, they published a paper about it, thereby popularizing the notion that “just making the black box bigger” might be a viable path to AGI. When they demoed this strategy with products like GPT-3, DALL-E, and CLIP, they described much of the architecture publicly, inspiring others to pursue similar research directions.
So in effect, as far as I can tell, they created a very productive “creating the x-risk” department, alongside a smaller “mitigating that risk” department—the presence of which I take the OP to describe as reassuring—staffed by a few of the most notably optimistic alignment researchers, many of whom left because even they felt too worried about OpenAI’s recklessness.
After all of that, why would we expect they’ll suddenly start being prudent and cautious when it comes time to deploy transformative tech? I don’t think we should.
My strong bet is that OpenAI leadership are good people, in the standard deontological sense, and I think that’s overwhelmingly the sense that should govern interpersonal interactions. I think they’re very likely trying hard, from their perspective, to make this go well, and I urge you, dear reader, not to be an asshole to them. Figuring out what makes sense is hard; doing things is hard; attempts to achieve goals often somehow accidentally end up causing the opposite thing to happen; nobody will want to work with you if small strategic updates might cause you to suddenly treat them totally differently.
But I think we are well past the point where it plausibly makes sense for pessimistic folks to refrain from stating their true views about OpenAI (or any other lab) just to be polite. They didn’t listen the first times alignment researchers screamed in horror, and they probably won’t listen the next times either. So you might as well just say what you actually think—at least that way, anyone who does listen will find a message worth hearing.
What links here?
- Wei Dai's comment on My takes on the FTX situation will (mostly) be cold, not hot by Holden Karnofsky (EA Forum; 19 Nov 2022 14:48 UTC; 105 points)

Adam Scholl 22 Dec 2019 11:26 UTC
109 points
in reply to: AnnaSalamon’s comment on: We run the Center for Applied Rationality, AMA
I think a crisp summary here is: CFAR is in the business of helping create scientists, more than the business of doing science. Some of the things it makes sense to do to help create scientists look vaguely science-ish, but others don’t. And this sometimes causes people to worry (understandably, I think) that CFAR isn’t enthused about science, or doesn’t understand its value.

But if you’re looking to improve a given culture, one natural move is to explore that culture’s blindspots. And I think exploring those blindspots is often not going to look like an activity typical of that culture.

An example: there’s a particular bug I encounter extremely often at AIRCS workshops, but rarely at other workshops. I don’t yet feel like I have a great model of it, but it has something to do with not fully understanding how words have referents at different levels of abstraction. It’s the sort of confusion that I think reading A Human’s Guide to Words often resolves in people, and which results in people asking questions like:
- “Should I replace [my core goal x] with [this list of “ethical” goals I recently heard about]?”
- “Why is the fact that I have a goal a good reason to optimize for it?”
- “Are propositions like ‘x is good’ or ‘y is beautiful’ even meaningful claims?”
When I encounter this bug I often point to a nearby tree, and start describing it at different levels of abstraction. The word “tree” refers to a bunch of different related things: to a member of an evolutionarily-related category of organisms, to the general sort of object humans tend to emit the phonemes “tree” to describe, to this particular mid-sized physical object here in front of us, to the particular arrangement of particles that composes the object, etc. And it’s sensible to use the term “tree” anyway, as long as you’re careful to track which level of abstraction you’re referring to with a given proposition—i.e., as long as you’re careful to be precise about exactly which map/territory correspondence you’re asserting.

This is obvious to most science-minded people. But it’s often less obvious that the same procedure, with the same carefulness, is needed to sensibly discuss concepts like “goal” and “good.” Just as it doesn’t make sense to discuss whether a given tree is “strong” without distinguishing between “in terms of its likelihood of falling over” or “in terms of its molecular bonds,” it doesn’t make sense to discuss whether a goal is “good” without distinguishing between e.g. “relative to societal consensus” or “relative to your current preferences” or “relative to the preferences you might come to have given more time to think.”

This conversation often seems to help resolve the confusion. At some point, I may design a class about this, so that more such confusions can be resolved. But I expect that if I do, some of the engineers in the audience will get nervous, since it will look an awful lot like a philosophy class! (I already get this objection regularly one-on-one). That is, I expect some may wonder whether the AIRCS staff, which claim to be running workshops for engineers, are actually more enthusiastic about philosophy than engineering.

We’re not. Academic philosophy, at least, strikes me as an unusually unproductive field with generally poor epistemics. I don’t want to turn the engineers into philosophers—I just want to use a particular helpful insight from philosophy to patch a bug which, for whatever reason, seems to commonly afflict AIRCS participants.

CFAR faces this dilemma a lot. For example, we spent a bunch of time circling for a while, and this made many rationalists nervous—was CFAR as an institution, which claimed to be running workshops for science-minded, sequences-reading, law-based-reasoning-enthused rationalists, actually more enthusiastic about woo-laden authentic relating games?

We weren’t. But we looked around, and noticed that lots of the promising people around us seemed particularly bad at extrospection—i.e., at simulating the felt senses of their conversational partners in their own minds. This seemed worrying, among other reasons because early-stage research intuitions (e.g. about which lines of inquiry feel exciting to pursue) often seem to be stored sub-verbally. So we looked to specialists in extraspection for a patch.

Adam Scholl 20 Oct 2021 7:13 UTC
72 points
in reply to: Eli Tyre’s comment on: My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)
I also feel really frustrated that you wrote this, Anna. I think there are a number of obvious and significant disanalogies between the situations at Leverage versus MIRI/CFAR. There’s a lot to say here, but a few examples which seem especially salient:
- To the best of my knowledge, the leadership of neither MIRI nor CFAR has ever slept with any subordinates, much less many of them.
- While I think staff at MIRI and CFAR do engage in motivated reasoning sometimes wrt PR, neither org engaged in anything close to the level of obsessive, anti-epistemic reputational control alleged in Zoe’s post. MIRI and CFAR staff were not required to sign NDAs agreeing they wouldn’t talk badly about the org—in fact, at least in my experience with CFAR, staff much more commonly share criticism of the org than praise. CFAR staff were regularly encouraged to share their ideas at workshops and on LessWrong, to get public feedback. And when we did mess up, we tried quite hard to publicly and accurately describe our wrongdoing—e.g., Anna and I spent low-hundreds of hours investigating/thinking through the Brent affair, and tried so hard to avoid accidentally doing anti-epistemic reputational control (this was our most common topic of conversation during this process) that in my opinion, our writeup about it actually makes CFAR seem much more culpable than I think it was.
- As I understand it, there were ~3 staff historically whose job descriptions involved debugging in some way which you, Anna, now feel uncomfortable with/think was fucky. But to the best of your knowledge, these situations caused much less harm than e.g. Zoe seems to have experienced, and the large majority of staff did not experience this—in general staff rarely explicitly debugged each other, and when it did happen it was clearly opt-in, and fairly symmetrical (e.g., in my personal conversations with you Anna, I’d guess the ratio of you something-like-debugging me to the reverse is maybe 3/2?).
- CFAR put really a lot of time and effort into trying to figure out how to teach rationality techniques, and how to talk with people about x-risk, without accidentally doing something fucky to people’s psyches. Our training curriculum for workshop mentors includes extensive advice on ways to avoid accidentally causing psychological harm. Harm did happen sometimes, which was why our training emphasized it so heavily. But we really fucking tried, and my sense is that we actually did very well on the whole at establishing institutional and personal knowledge about how to be gentle with people in these situations; personally, it’s the skillset I’d most worry about the community losing if CFAR shut down and more events started being run by other orgs.
Insofar as you agree with the above, Anna, I’d appreciate you stating that clearly, since I think saying “the OP speaks for me” implies you think the core analogy described in the OP was non-misleading.

Adam Scholl 19 Oct 2021 6:43 UTC
57 points
in reply to: PhoenixFriend’s comment on: My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)
I’ve worked at CFAR for most of the last 5 years, and this comment strikes me as so wildly incorrect and misleading that I have trouble believing it was in fact written by a current CFAR employee. Would you be willing to verify your identity with some mutually-trusted 3rd party, who can confirm your report here? Ben Pace has offered to do this for people in the past.

Adam Scholl 26 Aug 2022 11:23 UTC
LW: 52 AF: 12
30
AF
on: Common misconceptions about OpenAI
Incorrect: OpenAI leadership is dismissive of existential risk from AI.
Why, then, would they continue to build the technology which causes that risk? Why do they consider it morally acceptable to build something which might well end life on Earth?

Adam Scholl 5 Mar 2024 12:41 UTC
50 points
21
in reply to: evhub’s comment on: Anthropic release Claude 3, claims >GPT-4 Performance
It seems Dario left Dustin Moskovitz with a different impression—that Anthropic had a policy/commitment to not meaningfully advance the frontier:

Adam Scholl 27 Dec 2022 18:56 UTC
43 points
17
in reply to: Andrew_Critch’s comment on: Let’s think about slowing down AI
Critch, I agree it’s easy for most people to understand the case for AI being risky. I think the core argument for concern—that it seems plausibly unsafe to build something far smarter than us—is simple and intuitive, and personally, that simple argument in fact motivates a plurality of my concern. That said:
- I think it often takes weirder, less intuitive arguments to address many common objections—e.g., that this seems unlikely to happen within our lifetimes, that intelligence far superior to ours doesn’t even seem possible, that we’re safe because software can’t affect physical reality, that this risk doesn’t seem more pressing than other risks, that alignment seems easy to solve if we just x, etc.
- It’s also remarkably easy to convince many people that aliens visit Earth on a regular basis, that the theory of evolution via natural selection is bunk, that lottery tickets are worth buying, etc. So while I definitely think some who engage with these arguments come away having good reason to believe the threat is likely, for values of “good” and “believe” and “likely” at least roughly similar those common around here, I suspect most update something more like their professed belief-in-belief, than their real expectations—and that even many who do update their real expectations do so via symmetric arguments that leave them with poor models of the threat.
These factors make me nervous about strategies that rely heavily on convincing everyday people, or people in government, to care about AI risk, for reasons I don’t think are well described as “systematically discounting their opinions/agency.” Personally, I’ve engaged a lot with people working in various corners of politics and government, and decently much with academics, and I respect and admire many of them, including in ways I rarely admire rationalists or EA’s.
(For example, by my lights, the best ops teams in government are much more competent than the best ops teams around here; the best policy wonks, lawyers, and economists are genuinely really quite smart, and have domain expertise few R/EA’s have without which it’s hard to cause many sorts of plausibly-relevant societal change; perhaps most spicily, I think academics affiliated with the Santa Fe Institute have probably made around as much progress on the alignment problem so far as alignment researchers, without even trying to, and despite being (imo) deeply epistemically confused in a variety of relevant ways).
But there are also a number of respects in which I think rationalists and EA’s tend to far outperform any other group I’m aware of—for example, in having beliefs that actually reflect their expectations, trying seriously to make sure those beliefs are true, being open to changing their mind, thinking probabilistically, “actually trying” to achieve their goals as a behavior distinct from “trying their best,” etc. My bullishness about these traits is why e.g. I live and work around here, and read this website.
And on the whole, I am bullish about this culture. But it’s mostly the relative scarcity of these and similar traits in particular, not my overall level of enthusiasm or respect for other groups, that causes me to worry they wouldn’t take helpful actions if persuaded of AI risk.
My impression is that it’s unusually difficult to figure out how to take actions that reduce AI risk without substantial epistemic skill of a sort people sometimes have around here, but only rarely have elsewhere. On my models, this is mostly because:
- There are many more ways to make the situation worse than better;
- A number of key considerations are super weird and/or terrifying, such that it’s unusually hard to reason well about them;
- It seems easier for people to grok the potential importance of transformative AI, than the potential danger.
My strong prior is that, to accomplish large-scale societal change, you nearly always need to collaborate with people who disagree with you, even about critical points. And I’m sympathetic to the view that this is true here, too; I think some of it probably is. But I think the above features make this more fraught than usual, in a way that makes it easy for people who grok the (simpler) core argument for concern, but not some of the (typically more complex) ancillary considerations, to accidentally end up making the situation even worse.
Here are some examples of (what seem to me like) this happening:
- The closest thing I’m aware of to an official US government position on AI risk is described in the 2016 and 2017 National Science and Technology Council reports. I haven’t read all of them, but the parts I have read struck me as a strange mix of claims like “maybe this will be a big deal, like mobile phones were,” and “maybe this will be a big deal, in the sense that life on Earth will cease to exist.” And like, I can definitely imagine explanations for this that don’t much involve the authors misjudging the situation—maybe their aim was more to survey experts than describe their own views, or maybe they were intentionally underplaying the threat for fear of starting an arms race, etc. But I think my lead hypothesis is more that the authors just didn’t actually, viscerally consider that the sentences they were writing might be true, in the sense of describing a reality they might soon inhabit.
  - I think rationalists and EA’s tend to make this sort of mistake less often, since the “taking beliefs seriously”-style epistemic orientation common around here has the effect of making it easier for people to viscerally grasp that trend lines on graphs and so forth might actually reflect reality. (Like, one frame on EA as a whole, is “an exercise in avoiding the ‘learning about the death of a million feels like a statistic, not a tragedy’ error”). And this makes me at least somewhat more confident they won’t do dumb things upon becoming worried about AI risk, since without this epistemic skill, I think it’s easier to make critical errors like overestimating how much time we have, or underestimating the magnitude or strangeness of the threat.
- As I understand it, OpenAI is named what it is because, at least at first, its founders literally hoped to make AGI open source. (Elon Musk: “I think the best defense against the misuse of AI is to empower as many people as possible to have AI. If everyone has AI powers, then there’s not any one person or a small set of individuals who can have AI superpower.”)
  - By my lights, there are unfortunately a lot of examples of rationalists and EA’s making big mistakes while attempting to reduce AI risk. But it’s at least… hard for me to imagine most of them making this one? Maybe I’m being insufficiently charitable here, but from my perspective, this just fails a really basic “wait, but then what happens next?” sanity check, that I think should have occurred to them more or less immediately, and that I suspect would have to most rationalists and EA’s.
- For me, the most striking aspect of the AI Impacts poll, was that all those ML researchers who reported thinking ML had a substantial chance of killing everyone, still research ML. I’m not sure why they do this; I’d guess some of them are convinced for some reason or another that working on it still makes sense, even given that. But my perhaps-uncharitable guess is that most of them actually don’t—that they don’t even have arguments which feel compelling to them that justify their actions, but that they for some reason press on anyway. This too strikes me as a sort of error R/EA’s are less likely to make.
  - (When Bostrom asked Geoffrey Hinton why he still worked on AI, if he thought governments would likely use it to terrorize people, he replied, “I could give you the usual arguments, but the truth is that the prospect of discovery is too sweet”).
- Sam Altman recently suggested, on the topic of whether to slow down AI, that “either we figure out how to make AGI go well or we wait for the asteroid to hit.”
  - Maybe he was joking, or meant “asteroid” as a stand-in for all potentially civilization-ending threats, or something? But that’s not my guess, because his follow-up comment is about how we need AGI to colonize space, which makes me suspect he actually considers asteroid risk in particular a relevant consideration for deciding when to deploy advanced AI. Which if true, strikes me as… well, more confused than any comment in this thread strikes me. And it seems like the kind of error that might, for example, cause someone to start an org with the hope of reducing existential risk, that mostly just ends up exacerbating it.
Obviously our social network doesn’t have a monopoly on good reasoning, intelligence, or competence, and lord knows it has plenty of its own pathologies. But as I understand it, most of the reason the rationality project exists is to help people reason more clearly about the strange, horrifying problem of AI risk. And I do think it has succeeded to some degree, such that empirically, people with less exposure to this epistemic environment far more often take actions which seem terribly harmful to me.

Adam Scholl 24 Jun 2022 7:04 UTC
37 points
−22
on: LessWrong Has Agree/Disagree Voting On All New Comment Threads
For what it’s worth, I quite dislike this change. Partly because I find it cluttered and confusing, but also because I think audience agreement/disagreement should in fact be a key factor influencing comment rankings.
In the previous system, my voting strategy roughly reflected the product of (how glad I was some comment was written) and (how much I agreed with it). I think this product better approximates my overall sense of how much I want to recommend people read the comment—since all else equal, I do want to recommend comments more insofar as I agree with them more.

Adam Scholl 17 Feb 2023 0:32 UTC
36 points
27
in reply to: johnswentworth’s comment on: Bing Chat is blatantly, aggressively misaligned
John, it seems totally plausible to me that these examples do just reflect something like “hallucination,” in the sense you describe. But I feel nervous about assuming that! I know of no principled way to distinguish “hallucination” from more goal-oriented thinking or planning, and my impression is that nobody else does either.
I think it’s generally unwise to assume LLM output reflects its internal computation in a naively comprehensible way; it usually doesn’t, so I think it’s a sane prior to suspect it doesn’t here, either. But at our current level of understanding of the internal computation happening in these models, I feel wary of confident-seeming assertions that they’re well-described in any particular way—e.g., as “hallucinations,” which I think is far from a well-defined concept, and which I don’t have much confidence carves reality at its joints—much less that they’re not dangerous.
So while I would personally bet fairly strongly against the explicit threats produced by Bing being meaningfully reflective of threatening intent, it seems quite overconfident to me to suggest they don’t “at all imply” it! From my perspective, they obviously imply it, even if that’s not my lead hypothesis for what’s going on.

Adam Scholl 22 Dec 2019 1:23 UTC
35 points
in reply to: habryka’s comment on: We run the Center for Applied Rationality, AMA
I have a Google Doc full of ideas. Probably I’ll never write most of these, and if I do probably much of the content will change. But here are some titles, as they currently appear in my personal notes:
- Mesa-Optimization in Humans
- Primitivist Priors v. Pinker Priors
- Local Deontology, Global Consequentialism
- Fault-Tolerant Note-Scanning
- Goal Convergence as Metaethical Crucial Consideration
- Embodied Error Tracking
- Abnormally Pleasurable Insights
- Burnout Recovery
- Against Goal “Legitimacy”
- Computational Properties of Slime Mold
- Steelmanning the Verificationist Criterion of Meaning
- Manual Tribe Switching
- Manual TAP Installation
- Keep Your Hobbies

Adam Scholl 22 Dec 2019 11:26 UTC
29 points
in reply to: Adam Scholl’s comment on: We run the Center for Applied Rationality, AMA
(To be clear the above is an account of why I personally feel excited about CFAR having investigated circling. I think this also reasonably describes the motivations of many staff, and of CFAR’s behavior as an institution. But CFAR struggles with communicating research intuitions, too; I think in this case these intuitions did not propagate fully among our staff, and as a result that we did employ a few people for a while whose primary interest in circling seemed to me to be more like “for its own sake,” and who sometimes discussed it in ways which felt epistemically unhealthy to me. I think people correctly picked up on this as worrying, and I don’t want to suggest that didn’t happen; just that there is, I think, a sensible reason why CFAR as an institution tends to investigate local blindspots by searching for non-locals with a patch, thereby alarming locals about our epistemic allegiance).

Adam Scholl 5 Mar 2024 19:06 UTC
28 points
23
in reply to: LawrenceC’s comment on: Anthropic release Claude 3, claims >GPT-4 Performance
Yeah, seems plausible; but either way it seems worth noting that Dario left Dustin, Evan and Anthropic’s investors with quite different impressions here.

Adam Scholl 26 Aug 2022 11:07 UTC
28 points
24
on: Common misconceptions about OpenAI
Incorrect: OpenAI is not aware of the risks of race dynamics.
I don’t think this is a common misconception. I, at least, have never heard anyone claim OpenAI isn’t aware of the risk of race dynamics—just that it nonetheless exacerbates them. So I think this section is responding to a far dumber criticism than the one which people actually commonly make.

Adam Scholl 19 Apr 2024 15:53 UTC
25 points
17
on: Express interest in an “FHI of the West”
Man, I can’t believe there are no straightforwardly excited comments so far!
Personally, I think an institution like this is sorely needed, and I’d be thrilled if Lightcone built one. There are remarkably few people in the world who are trying to think carefully about the future, and fewer still who are trying to solve alignment; institutions like this seem like one of the most obvious ways to help them.

Adam Scholl 19 Oct 2021 7:11 UTC
25 points
in reply to: jessicata’s comment on: My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)
Sure, but they led with “I’m a CFAR employee,” which suggests they are a CFAR employee. Is this true?

Adam Scholl 20 Oct 2021 9:22 UTC
24 points
in reply to: Benquo’s comment on: My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)
I agree manager/staff relations have often been less clear at CFAR than is typical. But I’m skeptical that’s relevant here, since as far as I know there aren’t really even borderline examples of this happening. The closest example to something like this I can think of is that staff occasionally invite their partners to attend or volunteer at workshops, which I think does pose some risk of fucky power dynamics, albeit dramatically less risk imo than would be posed by “the clear leader of an organization, who’s revered by staff as a world-historically important philosopher upon whose actions the fate of the world rests, and who has unilateral power to fire any of them, sleeps with many employees.”
Am I missing something here? The communication I read from CFAR seemed like it was trying to reveal as little as it could get away with, gradually saying more (and taking a harsher stance towards Brent) in response to public pressure, not like it was trying to help me, a reader, understand what had happened.
As lead author on the Brent post, I felt bummed reading this. I tried really hard to avoid letting my care for/interest in CFAR affect my descriptions of what happened, or my choices about what to describe. Anna and I spent quite large amounts of time—at least double-digit hours, I think probably triple-digit—searching for ways our cognition might be biased or motivated or PR-like, and trying to correct for that. We debated and introspected about it, ran drafts by friends of ours who seemed unusually likely to call us on bullshit, etc.
Looking back, my sense remains that we basically succeeded—i.e., that we described the situation about as accurately and neutrally as we could have. If I’m wrong about this… well, it wasn’t for lack of trying.

Adam Scholl 19 Oct 2021 22:43 UTC
24 points
in reply to: Unreal’s comment on: My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)
I like the local discourse norm of erring on the side of assuming good faith, but like steven0461, in this case I have trouble believing this was misleading by accident. Given how obviously false, or at least seriously misleading, many of these claims are (as I think accurately described by Anna/Duncan/Eli), my lead hypothesis is that this post was written by a former staff member, who was posing as a current staff member to make the critique seem more damning/informed, who had some ax to grind and was willing to engage in deception to get it ground, or something like that...?

Adam Scholl 20 Dec 2019 18:24 UTC
24 points
in reply to: DanielFilan’s comment on: We run the Center for Applied Rationality, AMA
I think it’s true that CFAR mostly moved away from teaching things like explicit probabilistic forecasting, and toward something else, although I would describe that something else differently—more like, skills relevant for hypothesis generation, noticing confusion, communicating subtle intuitions, updating on evidence about crucial considerations, and in general (for lack of a better way to describe this) “not going insane when thinking about x-risk.”

I favor this shift, on the whole, because my guess is that skills of the former type are less important bottlenecks for the problems CFAR is trying to help solve. That is, all else equal, if I could press a button to either make alignment researchers and the people who surround them much better calibrated, or much better at any of those latter skills, I’d currently press the latter button.

But I do think it’s plausible CFAR should move somewhat backward on this axis, at the margin. Some skills from the former category would be pretty easy to teach, I think, and in general I have some kelly betting-ish inclination to diversify the goals of our curricular portfolio, in case our core assumptions are wrong.

Adam Scholl 22 Aug 2020 23:23 UTC
LW: 23 AF: 10
AF
in reply to: Rohin Shah’s comment on: Matt Botvinick on the spontaneous emergence of learning algorithms
I feel confused about why, on this model, the researchers were surprised that this occurred, and seem to think it was a novel finding that it will inevitably occur given the three conditions described. Above, you mentioned the hypothesis that maybe they just weren’t very familiar with AI. But looking at the author list, and their publications (e.g.1, 2, 3, 4, 5, 6, 7, 8), this seems implausible to me. Most of the co-authors are neuroscientists by training, but a few have CS degrees, and all but one have co-authored previous ML papers. It’s hard for me to imagine their surprise was due to them lacking basic knowledge about RL?
Also, this OpenAI paper (whose authors seem quite familiar with ML)—which the summary of Wang et al. on DeepMind’s website describes as “closely related work,” and which appears to me to involve a very similar setup— describes their result similarly:
We structure the agent as a recurrent neural network, which receives past rewards, actions, and termination flags as inputs in addition to the normally received observations. Furthermore, its internal state is preserved across episodes, so that it has the capacity to perform learning in its own hidden activations. The learned agent thus also acts as the learning algorithm, and can adapt to the task at hand when deployed.
As I understand it, the OpenAI authors also think they can gather evidence about the structure of the algorithm simply by looking at its behavior. Given a similar series of experiments (mostly bandit tasks, but also a maze solver), they conclude:
the dynamics of the recurrent network come to implement a learning algorithm entirely separate from the one used to train the network weights… the procedure the recurrent network implements is itself a full-fledged reinforcement learning algorithm, which negotiates the exploration-exploitation tradeoff and improves the agent’s policy based on reward outcomes… this learned RL procedure can differ starkly from the algorithm used to train the network’s weights.
They then run an experiment designed specifically to distinguish whether meta-RL was giving rise to a model-free system, or “a model-based system which learns an internal model of the environment and evaluates the value of actions at the time of decision-making through look-ahead planning,” and suggest the evidence implies the latter. This sounds like a description of search to me—do you think I’m confused?
I get the impression from your comments that you think it’s naive to describe this result as “learning algorithms spontaneously emerging.” You describe the lack of LW/AF pushback against that description as “a community-wide failure,” and mention updating as a result toward thinking AF members “automatically believe anything written in a post without checking it.”
But my impression is that OpenAI describes their similar result in a similar way. Do you think my impression is wrong? Or that e.g. their description is also misleading?
--
I’ve been feeling very confused lately about how people talk about “search,” and have started joking that I’m a search panpsychist. Lots of interesting phenomenon look like piles of thermostats when viewed from the wrong angle, and I worry the conventional lens is deceptively narrow.
That said, when I condition on (what I understand to be) the conventional conception, it’s difficult for me to imagine how e.g. the maze-solver described in the OpenAI paper can quickly and reliably locate maze exits, without doing something reasonably describable as searching for them.
And it seems to me that Wang et al. should be taken as evidence that “learning algorithms producing other search-performing learning algorithms” is convergently useful/likely to be a common feature of future systems, even if you don’t think that’s what happened in their paper, as long as you assign decent credence to their underlying model that this is what’s going on in PFC, and that search occurs in PFC.
If the primary difference between the DeepMind and OpenAI meta-RL architecture and the PFC/DA architecture is scale, I think there’s reasonable reason to suspect something much like mesa-optimization will emerge in future meta-RL systems, even if it hasn’t yet. That is, I interpret this result as evidence for the hypothesis that highly competent general-ish learners might tend to exhibit this feature, since (among other reasons) it increased my credence that it is already exhibited by the only existing member of that reference class.
Evan mentions agreeing that this result isn’t new evidence in favor of mesa-optimization. But he also mentions that Risks from Learned Optimization references these two papers, and describes them as “the closest to producing mesa-optimizers of any existing machine learning research.” I feel confused about how to reconcile these two claims. I didn’t realize these papers were mentioned in Risks from Learned Optimization, but if I had, I think I would have been even more inclined to post this/try to ensure people knew about the results, since my (perhaps naive, perhaps not understanding ways this is disanalogous) prior is that the closest existing example to this problem might provide evidence about its nature or likelihood.

Adam Scholl 16 Sep 2020 7:16 UTC
LW: 22 AF: 7
AF
on: My computational framework for the brain
Your posts about the neocortex have been a plurality of the posts I’ve been most excited to read this year. I’m super interested in the questions you’re asking, and it drives me nuts that they’re not asked more in the neuroscience literature.
But there’s an aspect of these posts I’ve found frustrating, which is something like the ratio of “listing candidate answers” to “explaining why you think those candidate answers are promising, relative to nearby alternatives.”
Interestingly, I also have this gripe when reading Friston and Hawkins. And I feel like I also have this gripe about my own reasoning, when I think about this stuff—it feels phenomenologically like the only way I know how to generate hypotheses in this domain is by inducing a particular sort of temporary overconfidence, or something.
I don’t feel incentivized to do this nearly as much in other domains, and I’m not sure what’s going on. My lead hypothesis is that in neuroscience, data is so abundant, and theories/frameworks so relatively scarce, that it’s unusually helpful to ignore lots of things—e.g. via the “take as given x, y, z, and p” motion—in order to make conceptual progress. And maybe there’s just so much available data here that it would be terribly sisiphean to try to justify all the things one takes as given when forming or presenting intuitions about underlying frameworks. (Indeed, my lead hypothesis for why so many neuroscientists seem to employ strategies like, “contribute to the ‘figuring out what roads do’ project by spending their career measuring the angles of stop-sign poles relative to the road,” is that they feel it’s professionally irresponsible, or something, to theorize about underlying frameworks without first trying to concretely falsify a mountain of assumptions).
I think some amount of this motion is helpful for avoiding self-delusion, and the references in your posts make me think you do it at least a bit already. So I guess I just want to politely—and super gratefully, I’m really glad you write these posts regardless! If trying to do this would turn you into a stop sign person, don’t do it!—suggest that explicating these more might make it easier for readers to understand your intuitions.
I have many proto-questions about your model, and don’t want to spend the time to flesh them all out. But here are some sketches that currently feel top-of-mind:
- Say there exist genes that confer advantage in math-ey reasoning. By what mechanism is this advantage mediated, if the neocortex is uniform? One story, popular among the “stereotypes of early 2000s cognitive scientists” section of my models, is that brains have an “especially suitable for maths” module, and that genes induce various architectural changes which can improve or degrade its quality. What would a neocortical uniformist’s story be here—that genes induce architectural changes which alter the quality of the One Learning Algorithm in general? If you explain it as genes having the ability to tweak hyperparameters or the gross wiring diagram in order to degrade or improve certain circuits’ ability to run algorithms this domain-specific, is it still explanatorily useful to describe the neocortex as uniform?
  - My quick, ~90 min investigation into whether neuroscience as a field buys the neocortical uniformity hypothesis suggested it’s fairly controversial. Do you know why? Are the objections mostly similar to those of Marcus et al.?
- Do you have the intuition that aspects of the neocortical algorithm itself (or the subcortical algorithms themselves) might be safety-relevant? Or is your safety-relevance intuition mostly about the subcortical steering mechanism? (Fwiw, I have the former intuition, in that I’m suspicious some of the features of the neocortical algorithm that cause humans to differ from “hardcore optimizers” exist for safety-relevant reasons).
- In general I feel frustrated with the focus in neuroscience on the implementational Marr Level, relative to the computational and algorithmic levels. I liked the mostly-computational overview here, and the algorithmic sketch in your Predictive Coding = RL + SL + Bayes + MPC post, but I feel bursting with implementational questions. For example:
  - As I understand it, you mention “PGM-type message passing” as a candidate class of algorithm that might perform the “select the best from a population of models” function. Do you just mean you suspect there is something in the general vicinity of a belief propagation algorithm going on here, or is your intuition more specific? If the latter, is the Dileep George paper the main thing motivating that intuition?
  - I don’t currently know whether the neuroscience lit contains good descriptions of how credit assignment is implemented. Do you? Do you feel like you have a decent guess, or know whether someone else does?
    I have the same question about whatever mechanism approximates Bayesian priors—I keep encountering vague descriptions of it being encoded in dopamine distributions, but I haven’t found a good explanation of how that might actually work.
- Are you sure PP deemphasizes the “multiple simultaneous generative models” frame? I understood the references to e.g. the “cognitive economy” in Surfing Uncertainty to be drawing an analogy between populations of individuals exchanging resources in a market, and populations of models exchanging prediction error in the brain.
- Have you thought much about whether there are parts of this research you shouldn’t publish? I notice feeling slightly nervous every time I see you’ve made a new post, I think because I basically buy the “safety and capabilities are in something of a race” hypothesis, and fear that succeeding at your goal and publishing about it might shorten timelines.