SERI MATS ’21, Cognitive science @ Yale ‘22, Meta AI Resident ’23, LTFF grantee. Currently doing prosocial alignment research @ AE Studio. Very interested in work at the intersection of AI x cognitive science x alignment x philosophy.
Cameron Berg
There’s a lot of overlap between alignment researchers and the EA community, so I’m wondering how that was handled.
Agree that there is inherent/unavoidable overlap. As noted in the post, we were generally cautious about excluding participants from either sample for reasons you mention and also found that the key results we present here are robust to these kinds of changes in the filtration of either dataset (you can see and explore this for yourself here).
With this being said, we did ask in both the EA and the alignment survey to indicate the extent to which they are involved in alignment—note the significance of the difference here:
From alignment survey:
From EA survey:
This question/result serves both as a good filtering criterion for cleanly separating out EAs from alignment researchers and also gives a pretty strong evidence that we are drawing on completely different samples across these surveys (likely because we sourced the data for each survey through completely distinct channels).
Regarding the support for various cause areas, I’m pretty sure that you’ll find the support for AI Safety/Long-Termism/X-risk is higher among those most involved in EA than among those least involved. Part of this may be because of the number of jobs available in this cause area.
Interesting—I just tried to test this. It is a bit hard to find a variable in the EA dataset that would cleanly correspond to higher vs. lower overall involvement, but we can filter by number of years one has been involved involved in EA, and there is no level-of-experience threshold I could find where there are statistically significant differences in EAs’ views on how promising AI x-risk is. (Note that years of experience in EA may not be the best proxy for what you are asking, but is likely the best we’ve got to tackle this specific question.)
Blue is >1 year experience, red is <1 year experience:
Blue is >2 years experience, red is <2 years experience:
Thanks for the comment!
Consciousness does not have a commonly agreed upon definition. The question of whether an AI is conscious cannot be answered until you choose a precise definition of consciousness, at which point the question falls out of the realm of philosophy into standard science.
Agree. Also happen to think that there are basic conflations/confusions that tend to go on in these conversations (eg, self-consciousness vs. consciousness) that make the task of defining what we mean by consciousness more arduous and confusing than it likely needs to be (which isn’t to say that defining consciousness is easy). I would analogize consciousness to intelligence in terms of its difficulty to nail down precisely, but I don’t think there is anything philosophically special about consciousness that inherently eludes modeling.
is there some secret sauce that makes the algorithm [that underpins consciousness] special and different from all currently known algorithms, such that if we understood it we would suddenly feel enlightened? I doubt it. I expect we will just find a big pile of heuristics and optimization procedures that are fundamentally familiar to computer science.
Largely agree with this too—it very well may be the case (as seems now to be obviously true of intelligence) that there is no one ‘master’ algorithm that underlies the whole phenomenon, but rather as you say, a big pile of smaller procedures, heuristics, etc. So be it—we definitely want to better understand (for reasons explained in the post) what set of potentially-individually-unimpressive algorithms, when run in concert, give you system that is conscious.
So, to your point, there is not necessarily any one ‘deep secret’ to uncover that will crack the mystery (though we think, eg, Graziano’s AST might be a strong candidate solution for at least part of this mystery), but I would still think that (1) it is worthwhile to attempt to model the functional role of consciousness, and that (2) whether we actually have better or worse models of consciousness matters tremendously.
There will be places on the form to indicate exactly this sort of information :) we’d encourage anyone who is associated with alignment to take the survey.
Thanks for taking the survey! When we estimated how long it would take, we didn’t count how long it would take to answer the optional open-ended questions, because we figured that those who are sufficiently time constrained that they would actually care a lot about the time estimate would not spend the additional time writing in responses.
In general, the survey does seem to take respondents approximately 10-20 minutes to complete. As noted in another comment below,
this still works out to donating $120-240/researcher-hour to high-impact alignment orgs (plus whatever the value is of the comparison of one’s individual results to that of community), which hopefully is worth the time investment :)
Ideally within the next month or so. There are a few other control populations still left to sample, as well as actually doing all of the analysis.
Thanks for sharing this! Will definitely take a look at this in the context of what we find and see if we are capturing any similar sentiment.
Thanks for calling this out—we’re definitely open to discussing potential opportunities for collaboration/engaging with the platform!
It’s a great point that the broader social and economic implications of BCI extend beyond the control of any single company, AE no doubt included. Still, while bandwidth and noisiness of the tech are potentially orthogonal to one’s intentions, companies with unambiguous humanity-forward missions (like AE) are far more likely to actually care about the societal implications, and therefore, to build BCI that attempts to address these concerns at the ground level.
In general, we expect the by-default path to powerful BCI (i.e., one where we are completely uninvolved) to be negative/rife with s-risks/significant invasions of privacy and autonomy, etc, which is why we are actively working to nudge the developmental trajectory of BCI in a more positive direction—i.e., one where the only major incentive is build the most human-flourishing-conducive BCI tech we possibly can.
With respect to the RLNF idea, we are definitely very sympathetic to wireheading concerns. We think that approach is promising if we are able to obtain better reward signals given all of the sub-symbolic information that neural signals can offer in order to better understand human intent, but as you correctly pointed out that can be used to better trick the human evaluator as well. We think this already happens to a lesser extent and we expect that both current methods and future ones have to account for this particular risk.
More generally, we strongly agree that building out BCI is like a tightrope walk. Our original theory of change explicitly focuses on this: in expectation, BCI is not going to be built safely by giant tech companies of the world, largely given short-term profit-related incentives—which is why we want to build it ourselves as a bootstrapped company whose revenue has come from things other than BCI. Accordingly, we can focus on walking this BCI developmental tightrope safely and for the benefit of humanity without worrying if we profit from this work.
We do call some of these concerns out in the post, eg:
We also recognize that many of these proposals have a double-edged sword quality that requires extremely careful consideration—e.g., building BCI that makes humans more competent could also make bad actors more competent, give AI systems manipulation-conducive information about the processes of our cognition that we don’t even know, and so on. We take these risks very seriously and think that any well-defined alignment agenda must also put forward a convincing plan for avoiding them (with full knowledge of the fact that if they can’t be avoided, they are not viable directions.)
Overall—in spite of the double-edged nature of alignment work potentially facilitating capabilities breakthroughs—we think it is critical to avoid base rate neglect in acknowledging how unbelievably aggressively people (who are generally alignment-ambivalent) are now pushing forward capabilities work. Against this base rate, we suspect our contributions to inadvertently pushing forward capabilities will be relatively negligible. This does not imply that we shouldn’t be extremely cautious, have rigorous info/exfohazard standards, think carefully about unintended consequences, etc—it just means that we want to be pragmatic about the fact that we can help solve alignment while being reasonably confident that the overall expected value of this work will outweigh the overall expected harm (again, especially given the incredibly high, already-happening background rate of alignment-ambivalent capabilities progress).
Thanks for your comment! I think we can simultaneously (1) strongly agree with the premise that in order for AGI to go well (or at the very least, not catastrophically poorly), society needs to adopt a multidisciplinary, multipolar approach that takes into account broader civilizational risks and pitfalls, and (2) have fairly high confidence that within the space of all possible useful things to do to within this broader scope, the list of neglected approaches we present above does a reasonable job of documenting some of the places where we specifically think AE has comparative advantage/the potential to strongly contribute over relatively short time horizons. So, to directly answer:
Is this a deliberate choice of narrowing your direct, object-level technical work to alignment (because you think this where the predispositions of your team are?), or a disagreement with more systemic views on “what we should work on to reduce the AI risks?”
It is something far more like a deliberate choice than a systemic disagreement. We are also very interested and open to broader models of how control theory, game theory, information security, etc have consequences for alignment (e.g., see ideas 6 and 10 for examples of nontechnical things we think we could likely help with). To the degree that these sorts of things can be thought of further neglected approaches, we may indeed agree that they are worthwhile for us to consider pursuing or at least help facilitate others’ pursuits—with the comparative advantage caveat stated previously.
- 19 Dec 2023 17:33 UTC; 2 points) 's comment on The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda by (
I’m definitely sympathetic to the general argument here as I understand it: something like, it is better to be more productive when what you’re working towards has high EV, and stimulants are one underutilized strategy for being more productive. But I have concerns about the generality of your conclusion: (1) blanket-endorsing or otherwise equating the advantages and disadvantages of all of the things on the y-axis of that plot is painting with too broad a brush. They vary, eg, in addictive potential, demonstrated medical benefit, cost of maintenance, etc. (2) Relatedly, some of these drugs (e.g., Adderall) alter the dopaminergic calibration in the brain, which can lead to significant personality/epistemology changes, typically as a result of modulating people’s risk-taking/reward-seeking trade-offs. Similar dopamine agonist drugs used to treat Parkinson’s led to pathological gambling behaviors in patients who took it. There is an argument to be made for at least some subset of these substances that the trouble induced by these kinds of personality changes may plausibly outweigh the productivity gains of taking the drugs in the first place.
27 people holding the view is not a counterexample to the claim that it is becoming less popular.
Still feels worthwhile to emphasize that some of these 27 people are, eg, Chief AI Scientist at Meta, co-director of CIFAR, DeepMind staff researchers, etc.
These people are major decision-makers in some of the world’s leading and most well-resourced AI labs, so we should probably pay attention to where they think AI research should go in the short-term—they are among the people who could actually take it there.
See also this survey of NLP
I assume this is the chart you’re referring to. I take your point that you see these numbers as increasing or decreasing (despite that where they actually are in an absolute sense seems harmonious with believing that brain-based AGI is entirely possible), but it’s likely that these increases or decreases are themselves risky statistics to extrapolate. These sorts of trends could easily asymptote or reverse given volatile field dynamics. For instance, if we linearly extrapolate from the two stats you provided (5% believe scaling could solve everything in 2018; 17% believe it in 2022), this would predict, eg, 56% of NLP researchers in 2035 would believe scaling could solve everything. Do you actually think something in this ballpark is likely?
Did the paper say that NeuroAI is looking increasingly likely?
I was considering the paper itself as evidence that NeuroAI is looking increasingly likely.
When people who run many of the world’s leading AI labs say they want to devote resources to building NeuroAI in the hopes of getting AGI, I am considering that as a pretty good reason to believe that brain-like AGI is more probable than I thought it was before reading the paper. Do you think this is a mistake?
Certainly, to your point, signaling an intention to try X is not the same as successfully doing X, especially in the world of AI research. But again, if anyone were to be able to push AI research in the direction of being brain-based, would it not be these sorts of labs?
To be clear, I do not personally think that prosaic AGI and brain-based AGI are necessarily mutually exclusive—eg, brains may be performing computations that we ultimately realize are some emergent product of prosaic AI methods that already basically exist. I do think that the publication of this paper gives us good reason to believe that brain-like AGI is more probable than we might have thought it was, eg, two weeks ago.
However, technological development is not a zero-sum game. Opportunities or enthusiasm in neuroscience doesn’t in itself make prosaic AGI less likely and I don’t feel like any of the provided arguments are knockdown arguments against ANN’s leading to prosaic AGI.
Completely agreed!
I believe there are two distinct arguments at play in the paper and that they are not mutually exclusive. I think the first is “in contrast to the optimism of those outside the field, many front-line AI researchers believe that major new breakthroughs are needed before we can build artificial systems capable of doing all that a human, or even a much simpler animal like a mouse, can do” and the second is “a better understanding of neural computation will reveal basic ingredients of intelligence and catalyze the next revolution in AI, eventually leading to artificial agents with capabilities that match and perhaps even surpass those of humans.”
The first argument can be read as a reason to negatively update on prosaic AGI (unless you see these ‘major new breakthroughs’ as also being prosaic) and the second argument can be read as a reason to positively update on brain-like AGI. To be clear, I agree that the second argument is not a good reason to negatively update on prosaic AGI.
Thanks for your comment!
As far as I can tell the distribution of views in the field of AI is shifting fairly rapidly towards “extrapolation from current systems” (from a low baseline).
I suppose part of the purpose of this post is to point to numerous researchers who serve as counterexamples to this claim—i.e., Yann LeCun, Terry Sejnowski, Yoshua Bengio, Timothy Lillicrap et al seem to disagree with the perspective you’re articulating in this comment insofar as they actually endorse the perspective of the paper they’ve coauthored.
You are obviously a highly credible source on trends in AI research—but so are they, no?
And if they are explicitly arguing that NeuroAI is the route they think the field should go in order to get AGI, it seems to me unwise to ignore or otherwise dismiss this shift.
Agreed that there are important subtleties here. In this post, I am really just using the safety-via-debate set-up as a sort of intuitive case for getting us thinking about why we generally seem to trust certain algorithms running in the human brain to adjudicate hard evaluative tasks related to AI safety. I don’t mean to be making any especially specific claims about safety-via-debate as a strategy (in part for precisely the reasons you specify in this comment).
Thanks for the comment! I do think that, at present, the only working example we have of an agent able explicitly self-inspect its own values is in the human case, even if getting the base shards ‘right’ in the prosocial sense would likely entail that they will already be doing self-reflection. Am I misunderstanding your point here?
Thanks Lukas! I just gave your linked comment a read and I broadly agree with what you’ve written both there and here, especially w.r.t. focusing on the necessary training/evolutionary conditions out of which we might expect to see generally intelligent prosocial agents (like most humans) emerge. This seems like a wonderful topic to explore further IMO. Any other sources you recommend for doing so?
Hi Joe—likewise! This relationship between prosociality and distribution of power in social groups is super interesting to me and not something I’ve given a lot of thought to yet. My understanding of this critique is that it would predict something like: in a world where there are huge power imbalances, typical prosocial behavior would look less stable/adaptive. This brings to mind for me things like ‘generous tit for tat’ solutions to prisoner’s dilemma scenarios—i.e., where being prosocial/trusting is a bad idea when you’re in situations where the social conditions are unforgiving to ‘suckers.’ I guess I’m not really sure what exactly you have in mind w.r.t. power specifically—maybe you could elaborate on (if I’ve got the ‘prediction’ right in the bit above) why one would think that typical prosocial behavior would look less stable/adaptive in a world with huge power imbalances?
I broadly agree with Viliam’s comment above. Regarding Dagon’s comment (to which yours is a reply), I think that characterizing my position here as ‘people who aren’t neurotypical shouldn’t be trusted’ is basically strawmanning, as I explained in this comment. I explicitly don’t think this is correct, nor do I think I imply it is anywhere in this post.
As for your comment, I definitely agree that there is a distinction to be made between prosocial instincts and the learned behavior that these instincts give rise to over the lifespan, but I would think that the sort of ‘integrity’ that you point at here as well as the self-aware psychopath counterexample are both still drawing on particular classes of prosocial motivations that could be captured algorithmically. See my response to ‘plausible critique #1,’ where I also discuss self-awareness as an important criterion for prosociality.
Here is the full list of the alignment orgs who had at least one researcher complete the survey (and who also elected to share what org they are working for): OpenAI, Meta, Anthropic, FHI, CMU, Redwood Research, Dalhousie University, AI Safety Camp, Astera Institute, Atlas Computing Institute, Model Evaluation and Threat Research (METR, formerly ARC Evals), Apart Research, Astra Fellowship, AI Standards Lab, Confirm Solutions Inc., PAISRI, MATS, FOCAL, EffiSciences, FAR AI, aintelope, Constellation, Causal Incentives Working Group, Formalizing Boundaries, AISC.
~80% of the alignment sample is currently receiving funding of some form to pursue their work, and ~75% have been doing this work for >1 year. Seems to me like this is basically the population we were intending to sample.
Your expectation while taking the survey about whether we were going to be able to get a good sample does not say much about whether we did end up getting a good sample. Things that better tell us whether or not we got a good sample are, eg, the quality/distribution of the represented orgs and the quantity of actively-funded technical alignment researchers (both described above).
Note that the survey took people ~15 minutes to complete and resulted in a $40 donation being made to a high-impact organization, which puts our valuation of an hour of their time at ~$160 (roughly equivalent to the hourly rate of someone who makes ~$330k annually). Assuming this population would generally donate a portion of their income to high-impact charities/organizations by default, taking the survey actually seems to probably have been worth everyone’s time in terms of EV.