Environments as a bottleneck in AGI development

Richard_Ngo17 Jul 2020 5:02 UTC

LW: 41 AF: 18

Given a training environment or dataset, a training algorithm, an optimiser, and a model class capable of implementing an AGI (with the right parameters), there are two interesting questions we might ask about how conducive that environment is for training an AGI. The first is: how much do AGIs from that model class outperform non-AGIs? The second is: how straightforward is the path to reaching an AGI? We can visualise these questions in terms of the loss landscape of those models when evaluated on the training environment. The first asks how low the set of AGIs is, compared with the rest of the landscape. The second asks how favourable the paths through that loss landscape to get to AGIs are—that is, do the local gradients usually point in the right direction, and how deep are the local minima?

Some people believe that there are many environments in which AGIs can be reached via favourable paths in the loss landscape and dramatically outperform non-AGIs; let’s call this the easy paths hypothesis. By contrast, the hard paths hypothesis is that it’s rare for environments (even complex meta-environments consisting of many separate tasks) to straightforwardly incentivise the development of general intelligence. This would suggest that specific environmental features will be necessary to prevent most models from getting stuck in local minima where they only possess narrow, specialised cognitive skills. There has been a range of speculation on what such features might be—perhaps multi-agent autocurricula, or realistic simulations, or specific types of human feedback. I’ll discuss some of these possibilities later in the post.

This spectrum is complicated by its dependence on the model class, training algorithm, and choice of optimiser. If we had a perfect optimiser, then the hilliness of the loss landscape wouldn’t matter. For now, I’m imagining using optimisers fairly similar to current stochastic gradient descent. Meanwhile, I’m assuming in this post that (in accordance with Rich Sutton’s bitter lesson) our models and training algorithms won’t contain very strong inductive biases. In other words: we’ll develop powerful function approximators, but which functions they approximate will primarily be determined by their training environments (and possibly also regularisation, as I’ll discuss later).

Arguments for the hard paths hypothesis

When predicting AGI timelines, a lot of people focus on progress in compute and algorithms. But I think that environments are more important than they may at first seem, because we have reason to take the hard paths hypothesis seriously. The history of AI is full of realisations that solving high-level tasks is easier than we expect, because those tasks don’t require as much general intelligence as we thought (as highlighted by Moravec’s paradox). Chess doesn’t, Go doesn’t, Starcraft doesn’t. Rather, when we train on these sorts of environments, we get agents with narrow intelligence that is only useful in that environment. The lesson here is that neural networks are very good at doing exactly what we give them feedback to do—even when that feedback is random, large neural networks are capable of simply memorising a lot of information!

Another way of phrasing this point: each time we evaluate the training loss, that’s based on a model’s performance on a specific task. So we don’t have any principled way of rewarding models for doing so in a way that generalises to a wide range of unseen tasks. This is in theory a similar problem to making models generalise from the training set to the test set, but in practice much broader—since for an AI to be generally intelligent, it will need to be able to generalise to tasks that are very different to the ones it was trained on. How might AI researchers ensure this generalisation occurs? We can try to train them on a wide range of tasks, but we’re never going to be able to train them on anywhere near the full diversity of tasks that we want AGIs to be able to tackle. An alternative is regularisation, which is often used to prevent models from overfitting to their environments. I do expect regularisation to be very helpful overall, but we currently have little understanding of the extent to which regularisation is capable of converting hard path environments to easy path environments. In particular, it’s unclear what the relationship is between “preventing overfitting” and the type of broad generalisation between tasks that humans are capable of—for example, doing mathematics despite never having evolved for it.

I suspect that many people intuitively discount the hard paths hypothesis because humans managed to become generally intelligent without anyone designing our environment to encourage that. However, this objection is very vulnerable to anthropic considerations—that of course our environment proved sufficient, otherwise we wouldn’t be here asking the question! In other words, as long as the universe contains some environments which give rise to general intelligence, generally-intelligent observers will always find themselves arising in those environments, and never in the environments in which life got trapped in narrow-intelligence local optima. So we can’t infer from the existence of our ancestral environment how much of a “lucky coincidence” our own general intelligence is, or how many difficult-to-recreate components were crucial during its development.

Perhaps the example of humans would be strong evidence for the easy paths hypothesis if, even after scrutinising our ancestral environment carefully, we can’t think of any such components. But I don’t think we’re in that situation. There are many traits of ourselves or our ancestral environment which arguably helped steer us towards general intelligence (even ignoring the ones which primarily impacted brain function via brain size). An incomplete list: large group sizes, calorific benefits of (cooked) meat, need for coordination during hunting, sexual selection (and possibility of infidelity), extended childhood and parental relationships, benefits of teaching, benefits of detecting norm violations, dexterous fingers, vocal ability, high-fidelity senses. If it turns out that we only became generally intelligent because all of these variables came together just right, that suggests that designing easy-path training environments will be tricky. This difficulty is exacerbated by the fact that our understanding of human evolution is very incomplete, and so there are probably a bunch more factors which could be comparably important to the ones I described.

Here’s another way of thinking about my overall argument. Consider the thought experiment of scaling up the brain of a given animal species to have the same number of neurons as humans, while magically maintaining it at the same size and weight as their current brain, and requiring no energy to run (thus removing physical difficulties). Suppose we then fixed their bodies in their current forms, only allowing brain architecture and content to evolve (the precise details are a little fiddly, but I think the core idea makes sense). Almost any species we did this to would evolve additional narrow intellectual capabilities which are useful in their environments, since it’s unlikely that their current brain size is optimal when energy costs are removed—but how many of them would reach general intelligence with the span of a hundred million years or so (assuming no interactions with humans)? If easy-path environments are common, many should get there; if rare, then few. I expect that most animals in that situation wouldn’t reach sufficient levels of general intelligence to do advanced mathematics or figure out scientific laws. That might be because most are too solitary for communication skills to be strongly selected for, or because language is not very valuable even for social species (as suggested by the fact that none of them have even rudimentary languages). Or because most aren’t physically able to use complex tools, or because they’d quickly learn to exploit other animals enough that further intelligence isn’t very helpful, or… If true, this implies that we should take seriously the hypothesis that it will be difficult to build easy-path environments.

We should note, though, that even this thought experiment is not immune from anthropic considerations. Clearly the answer for chimpanzees will be highly correlated with the answer for humans. And strong correlations might remain even for animals that are very far from us on the evolutionary tree. For example, suppose sexual selection (which has ancient evolutionary origins) is a key requirement for developing general intelligence. Or imagine that the limited storage capacity of DNA imposes the regularisation necessary to push animals out of narrow-intelligence local optima. The fact that these traits wouldn’t occur by default in training environments for AGIs weakens the thought experiment’s ability to provide evidence in favour of the easy paths hypothesis.

We might dispute the relevance of thought experiments about biological environments by arguing that, unlike evolution, AI development is supervised by researchers who will be deliberately designing environments to make paths to AGI easier. For example, AIs trained on gigabytes of language data won’t need to derive language from scratch like humans did. However, other relevant features may be less straightforward to identify and implement. For instance, one argument for why most animals wouldn’t reach general intelligence under the conditions described above is that they don’t have sufficiently flexible appendages to benefit from general tool use. Yet I expect that implementing flexible interactions in simulations will be very difficult—even state-of-the-art video games are far from supporting this. As another example, it’s possible that when training an AI to produce novel scientific theories, we’ll need its training dataset to include thousands or millions of example theories in order to develop its ability to do scientific reasoning in a general way (as opposed to merely learning to regurgitate our existing scientific knowledge). Even if this isn’t totally infeasible, it’ll significantly slow down the process of developing those capabilities, compared with the possible world in which training on the entire internet provides an easy-path environment. What’s more, we simply don’t know right now which additional features will make a big difference, and it may take us a long time to figure that out. Anyone who thinks that they can identify a set of tasks which strongly incentivise the development of general intelligence should wonder how their position differs from the expectations of previous AI researchers who also expected that the tasks they worked on would require general intelligence to solve.

I’m still very uncertain about how likely different types of environments are to contain easy paths to AGI. But the hard paths hypothesis seems plausible—and the limitations which anthropic considerations place on our ability to refute it should push us towards expecting the development of AGI to take longer than we otherwise expected.

What links here?

Richard_Ngo17 Jul 2020 5:02 UTC

LW: 41 AF: 18

19 comments6 min readLW link

AI AI Timelines

gwern 23 Sep 2020 18:48 UTC
LW: 15 AF: 5
0
AF
“Blessings of scale” observations aside, it seems like right now, environments are not the bottleneck to DL/DRL work. No one failed to solve Go because gosh darn it, they just lacked a good Go simulator which correctly implemented the rules of the game; the limits to solving ALE-57 (like Montezuma’s Revenge) in general or as a single multi-task agent do not seem to be lack of Atari games where what we really need is ALE-526*; Procgen performance is not weak because of insufficient variation in levels; OpenAI Universe failed not for lack of tasks, to say the least; the challenge in creating or replicating GPT-3 is not in scraping the text (and GPT-3 didn’t even run 1 epoch!). Datasets/environments sometimes unlock new performance, like ImageNet, but even when one saturates, there’s typically more datasets which are not yet solved and cannot be solved simultaneously (JFT-300M, for example), and in the case of RL of course compute=data. If you went to any DRL researcher, I don’t think many of them would name “we’ve solved all the existing environments to superhuman level and have unemployed ourselves!” as their biggest bottleneck.

Is it really the case that at some point we will be drowning in so many GPUs and petaflops that our main problem will become coming up with ever more difficult tasks to give them something useful to train on? Or is this specifically a claim about friendly AGI, where we lack any kind of environment which would seem to force alignment for maximum score?

* Apparently the existing ALE suite was chosen pretty haphazardly:

Our testing set was constructed by choosing semi-randomly from the 381 games listed on Wikipedia at the time of writing. Of these games, 123 games have their own Wikipedia page, have a single player mode, are not adult-themed or prototypes, and can be emulated in ALE. From this list, 50 games were chosen at random to form the test set.

I wonder how the history of DRL would’ve changed if they had happened to select from the other 73, or if Pitfall & Montezuma’s Revenge had been omitted? I don’t however, think it would’ve been a good use of their time in 2013 to work on adding more ALE games rather than, say, debugging GPU libraries to make it easier to run NNs at all...
- Richard_Ngo 27 Sep 2020 9:23 UTC
  LW: 10 AF: 4
  0
  AF Parent
  The fact that progress on existing environments (Go, ALE-57, etc) isn’t bottlenecked by environments doesn’t seem like particularly useful evidence. The question is whether we could be making much more progress towards AGI with environments that were more conducive to developing AGI. The fact that we’re running out of “headline” challenges along the lines of Go and Starcraft is one reason to think that having better environments would make a big difference—although to be clear, the main focus of my post is on the coming decades, and the claim that environments are currently a bottleneck does seem much weaker.
  More concretely, is it possible to construct some dataset on which our current methods would get significantly closer to AGI than they are today? I think that’s plausible—e.g. perhaps we could take the linguistic corpus that GPT-3 was trained on, and carefully annotate what counts as good reasoning and what doesn’t. (In some ways this is what reward modelling is trying to do—but that focuses more on alignment than capabilities.)
  Or another way of putting it: suppose we gave the field of deep learning 10,000x current compute and algorithms that are 10 years ahead of today. Would people know what to apply them to, in order to get much closer to AGI? If not, this also suggests that environments will be a bottleneck unless someone focuses on them within the next decade.
Steven Byrnes 17 Jul 2020 14:12 UTC
LW: 12 AF: 5
0
AF
Thanks for the thought-provoking post!
Evolving language (and other forms of social learning) poses at least a bit of a chicken-and-egg problem—you need speakers putting rich conceptual information into the sounds going out, and listeners trying to match the sounds coming in to rich conceptual information. Likewise, you need certain mental capabilities to create a technological society, but if you’re not already in that technological society, there isn’t necessarily an evolutionary pressure to have those capabilities. I suspect (not having thought about it very much) that these kinds of chicken-and-egg problems are why it took evolution so long to create human-like intelligence.
AGI wouldn’t have those chicken-and-egg problems. I think GPT-3 shows that just putting an AI in an environment with human language, and flagging the language as an important target for self-supervised learning, is already enough to coax the system to develop a wide array of human-like concepts. Now, GPT-3 is not an AGI, but I think it’s held back by having the wrong architecture, not the wrong environment. (OK, well, giving it video input couldn’t hurt.)
I’m also a bit confused about your reference to “Rich Sutton’s bitter lesson”. Do you agree that Transformers learn more / better in the same environment than MLPs? That LSTMs learn more / better in the same environment than simpler RNNs? If so, why not suppose that a future yet-to-be-discovered architecture, in the same training environment, will wind up more AGI-ish? (For what little it’s worth, I have my own theory along these lines, that we’re going to wind up with systems closer to today’s probabilistic programming and PGMs than to today’s DNNs.)
I’m not very confident about any of this. :-)
- Richard_Ngo 17 Jul 2020 16:00 UTC
  LW: 7 AF: 2
  0
  AF Parent
  AGI wouldn’t have those chicken-and-egg problems.
  I like and agree with this point, and have made a small edit to the original post to reflect that. However, while I don’t dispute that GPT-3 has some human-like concepts, I’m less sure about its reasoning abilities, and it’s pretty plausible to me that self-supervised training on language alone plateaus before we get to a GPT-N that does. I’m also fairly uncertain about this, but these types of environmental difficulties are worth considering.
  I’m also a bit confused about your reference to “Rich Sutton’s bitter lesson”. Do you agree that Transformers learn more / better in the same environment than MLPs? That LSTMs learn more / better in the same environment than simpler RNNs?
  Yes, but my point is that the *content* comes from the environment, not the architecture. We haven’t tried to leverage our knowledge of language by, say, using a different transformer for each part of speech. I (and I assume Sutton) agree that we’ll have increasingly powerful models, but they’ll also be increasing general—and therefore the question of whether a model with the capacity to become an AGI does so or not will depend to a significant extent on the environment.
  - Steven Byrnes 17 Jul 2020 16:47 UTC
    LW: 6 AF: 4
    0
    AF Parent
    Thanks! I’m still trying to zero in on where you’re coming from in the Rich Sutton thing, and your response only makes me more confused. Let me try something, and then you can correct me...
    My (caricatured) opinion is: “Transformers-trained-by-SGD can’t reason. We’ll eventually invent a different architecture-and-learning-algorithm that is suited to reasoning, and when we run that algorithm on the same text prediction task used for GPT-3, it will become an AGI, even though GPT-3 didn’t.”
    Your (caricatured) opinion is (maybe?): “We shouldn’t think of reasoning as a property of the architecture-and-learning-algorithm. Instead, it’s a property of the learned model. Therefore, if Transformers-trained-by-SGD-on-text-prediction can’t reason, that is evidence that the text-prediction task is simply not one that calls for reasoning. That in turn suggests that if we keep the same text prediction task, but substitute some unknown future architecture and unknown future learning algorithm, it also won’t be able to reason.”
    Is that anywhere close to where you’re coming from? Thanks for bearing with me.
    - Rohin Shah 17 Jul 2020 23:03 UTC
      LW: 8 AF: 4
      0
      AF Parent
      Not Richard, but I basically endorse that description as a description of my own view. (Note however that we don’t yet know that Transformers-trained-by-SGD-on-text-prediction can’t reason; I for one am not willing to claim that scaling even further will not result in reasoning.)
      It’s not a certainty—it’s plausible that text prediction is enough, if you just improved the architecture and learning algorithm a little bit—but I doubt it, except in some degenerate sense that you could put a ton of information / inductive bias into the architecture and make it an AGI that way.
      - Richard_Ngo 18 Jul 2020 10:14 UTC
        LW: 6 AF: 3
        0
        AF Parent
        I endorse Steve’s description as a caricature of my view, and also Rohin’s comment. To flesh out my view a little more: I think that GPT-3 doing so well on language without (arguably) being able to reason, is the same type of evidence as Deep Blue or AlphaGo doing well at board games without being able to reason (although significantly weaker). In both cases it suggests that just optimising for this task is not sufficient to create general intelligence. While it now seems pretty unreasonable to think that a superhuman chess AI would by default be generally intelligent, that seems not too far off what people used to think.
        Now, it might be the case that the task doesn’t matter very much for AGI if you “put a ton of information / inductive bias into the architecture”, as Rohin puts it. But I interpret Sutton to be arguing against our ability to do so.
        We’ll eventually invent a different architecture-and-learning-algorithm that is suited to reasoning
        There are two possible interpretations of which, one of which I agree with, one of which I don’t. I could either interpret you as saying that we’ll eventually develop an architecture/learning algorithm biased towards reasoning ability—I disagree with this.
        Or you could be saying that future architectures will be capable of reasoning in ways that transformers aren’t, by virtue of just being generally more powerful. Which seems totally plausible to me.
        Steven Byrnes 18 Jul 2020 17:37 UTC
        LW: 2 AF: 1
        0
        AF Parent
        Got it!
        Yeah, I think that reasoning, along with various other AGI prerequisites, requires an algorithm that does probabilistic programming / analysis-by-synthesis during deployment. And I think that trained Transformer models don’t do that, no matter what their size and parameters are. I guess I should write a post about why I think that—it’s a bit of a hazy tangle of ideas in my mind right now. :-)
        (I’m more-or-less saying the interpretation you disagree with in your second-to-last paragraph.)
        Thanks again for explaining!
Rohin Shah 17 Jul 2020 23:41 UTC
LW: 10 AF: 5
0
AF
Planned summary for the Alignment Newsletter:
Models built using deep learning are a function of the learning algorithm, the architecture, and the task / environment / dataset. While a lot of effort is spent on analyzing learning algorithms and architectures, not much is spent on the environment. This post asks how important it is to design a good environment in order to build AGI.
It considers two possibilities: “easy paths”, in which many environments would incentivize AGI, and “hard paths”, in which such environments are rare. (Note that “hard paths” can be true, even if an AGI would be optimal for most environments: if AGI would be optimal, but there is no path in the loss landscape to AGI that is steeper than other paths in the loss landscape, then we probably wouldn’t find AGI in that environment.)
The main argument for “hard paths” is to look at the history of AI research, where we often trained agents on tasks that were “hallmarks of intelligence” (like chess) and then found that the resulting systems were narrowly good at the particular task, but were not generally intelligent. You might think that it can’t be too hard, since our environment led to the creation of general intelligence (us), but this is subject to anthropic bias: only worlds with general intelligence would ask whether environments incentivize general intelligence, so they will always observe that their environment is an example that incentivizes general intelligence. It can serve as a proof of existence, but not as an indicator that it is particularly likely.
Planned opinion:
I think this is an important question for AI timelines, and the plausibility of “hard paths” is one of the central reasons that my timelines are longer than others who work on deep learning-based AGI. However, <@GPT-3@>(@Language Models are Few-Shot Learners@) demonstrates quite a lot of generality, so recently I’ve started putting more weight on “actually, designing the environment won’t be too hard”, which has correspondingly shortened my timelines.
- Richard_Ngo 18 Jul 2020 10:06 UTC
  LW: 8 AF: 4
  0
  AF Parent
  +1, I endorse this summary. I also agree that GPT-3 was an update towards the environment not mattering as much as I thought.
  Your summary might be clearer if you rephrase as:
  It considers two possibilities: the “easy paths hypothesis” that which many environments would incentivize AGI, and the “hard paths hypothesis” that such environments are rare.
  Since “easy paths” and “hard paths” by themselves are kinda ambiguous terms—are we talking about the paths, or the hypothesis? This is probably my fault for choosing bad terminology.
  - Rohin Shah 18 Jul 2020 23:49 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Done :)
johnswentworth 17 Jul 2020 16:19 UTC
LW: 6 AF: 4
0
AF
Good post, it makes a solid and probably under-appreciated point.
One very important thing I’d add: if you think you have some useful insight about what environment features are needed to incentivize general intelligence, please do not shout it from the rooftops! This is the sort of knowledge which very heavily benefits capabilities relative to alignment; propagating such information probably increases the chances of unaligned AI arriving before aligned AI.
- Richard_Ngo 18 Jul 2020 9:57 UTC
  LW: 8 AF: 3
  0
  AF Parent
  While this is a sensible point, I also think we should have a pretty high threshold for not talking about things, for a couple of reasons:
  1. Safety research is in general much more dependent on having good ideas than capabilities research (because a lot of capabilities are driven by compute, and also because there are fewer of us).
  2. Most of the AI people who listen to things people like us say are safety people.
  3. I don’t think there’s enough work on safety techniques tailored to specific paths to AGI (as I discuss briefly at the end of this post).
  4. It’s uncooperative and gives others a bad impression of us.
  So the type of thing I’d endorse not saying is “Here’s one weird trick which will make the generation of random environments much easier.” But something I endorse talking about is the potential importance of multi-agent environments for training AGIs, even though this is to me a central example of a “useful insight about what environment features are needed to incentivize general intelligence”.
Daniel Kokotajlo 17 Jul 2020 11:59 UTC
LW: 6 AF: 3
0
AF
Thanks for this; this is causing me to rethink my timelines estimates.
As I understand it, a short version of your argument would be: “Elephants didn’t evolve intelligence, despite being long-lived social mammals with huge brains and the ability to manipulate objects. But their environment wasn’t that different from the human environment. So getting the right environment for AGI might be tricky.”
- Richard_Ngo 17 Jul 2020 16:05 UTC
  LW: 8 AF: 3
  0
  AF Parent
  To be precise, the argument is that elephants (or other animals in similar situations) *wouldn’t* evolve to human-level intelligence. The fact that they *didn’t* isn’t very much information (for anthropic reasons, because if they did then it’d be them wondering why primates didn’t get to elephant-level intelligence).
  And then we should also consider that the elephant environment isn’t a randomly-sampled environment either, but is also correlated with ours (which means we should also anthropically discount this).
Donald Hobson 17 Jul 2020 14:49 UTC
LW: 2 AF: 1
0
AF
I agree that sufficiently powerful evolution or reinforcement learning will create AGI in the right environment. However, I think this might be like training gpt3 to do arithmetic. It works, but only by doing a fairly brute force search over a large space of designs. If we actually understood what we were doing, we could make far more efficient agents. I also think that such a design would be really hard to align.
hyperdrive 17 Jul 2020 12:50 UTC
2 points
0
I think humans are good at leaving traces of previously explored paths in their environment through text, art and sound. If you are stuck in a local minima you will still get exposed by these influences when living a “human life” which might make you reconsider if you really reached your goal or not. Couldn’t the general part of human intelligence be a product of the environment even more than the brain is?
E.Roland 23 Sep 2020 18:09 UTC
1 point
0
Really interesting post, and I think proper environment creation is one of, if not the most important question when it comes to the RL-based path to AGI.
You made a point that, contrary to the expectations of some, environments like Go or Starcraft are not sufficient to create the type of flexible, adaptive AGI that we’re looking for. I wonder if success in creating such AGI is dependent primarily on the complexity of the environment? That is, even though environments like Starcraft are quite complex and require some of the abstract reasoning we’d expect of AGI, the actual complexity isn’t anywhere close to the complexity of the real world in which we want those AGIs to perform. I wonder too if increasing environmental complexity will provide some inherent regularisation, i.e. it’s more difficult to fall into very narrow solutions when the possible states of your environment are very large.
If that is the case, the question that naturally follows is how do we create environments that mimic the complexity of the actual world? Of course a full simulation isn’t feasible, but I wonder if it would be possible to create highly complex world models using neural techniques.
This would be quite difficult computationally, but what would happen if one were to train an RL agent, where the environment was provided by a GPT-3 esque world model? For instance, imagine AI-dungeon (a popular gpt-3-based DnD dungeon master) and an RL agent interacting with AI-dungeon. I’m not certain what the utility function could be, maybe maximizing gold / xp / level / or similar? Certainly an agent that can “win at” DnD would be closer to an AGI than anything that’s been made to date. Similarly, I could imagine a future version of GPT that modeled video frames {i.e. predict the next frame based on the previous frame(s)}. An RL agent that was trained to produce some given frame as a desired state would certainly be able to solve problems in the real world, no? (Of course the actual implementation of a video based GPT would, computationally, be incredibly expensive, but not completely out of the question). Are there merits to this approach?
Viktor Rehnberg 29 Jul 2020 8:01 UTC
1 point
0
This seems related to a thought I had when reading An overview of 11 proposals for building safe advanced AI. How much harder is it to find an environment that promotes aligned AGI compared to any AGI?
It seems that a lot of the proposals for AGI under the current ML paradigm either utilizes oversight to get a second chance or to get an extra term in the loss-function to promote alignedness. How well either of these types of methods work seem to be dependent on the base rate of aligned AGI to any AGI that can emerge from a particular model and training environment. I’m thinking of it as roughly
$P ({AGI}_{aligned} | S, M, E_{base}) = \begin{matrix} P (AGI | M, E_{base}) \cdot (P ({AGI}_{aligned} | AGI) + P (align AGI | S, {AGI}_{unaligned}) P ({AGI}_{unaligned} | AGI)), \end{matrix}$
where $M$ is some model and $E_{base}$ is the training environment without safeguards $S$ to detect deceptive or otherwise catastrophic behavior.
This post seems to concern
$P (AGI | E) ? \approx P (AGI | M, E),$
how much does the environment compared to the model influence the emergence of AGI?
What I’m trying to get at is that I think a related important question is
$P ({AGI}_{aligned} | {AGI}_{E}) ? \approx P ({AGI}_{aligned} | {AGI}_{M, E}),$
how much does the alignedness of an emerging AGI depend on its environment compared to the model?