Thanks for the comment. I agree broadly of course, but the paper says more specific things. For example, agency needs to be prioritized, probably taken outside of standard optimization, otherwise decimating pressure is applied on other concepts including truth and other “human values”. The other part is a empirical one, also related to your concern, namely, human values are quite flexible and biology doesn’t create hard bounds / limits on depletion. If you couple that with ML/AI technologies that will predict what we will do next—then approaches that depend on human intent and values (broadly) are not as safe anymore.
Thanks so much for writing this, I think it’s a much needed—perhaps even a bit late contribution connecting static views of GPT-based LLMs to dynamical systems and predictive processing. I do research on empirical agency and it’s still surprises me how little the AI-safety community touches on this central part of agency—namely that you can’t have agents without this closed loop.
I’ve been speculating a bit (mostly to myself) about the possibility that “simulators” are already a type of organism—given that appear to do active inference—which is the main driving force for nervous system evolution. Simulators seem to live in this inter-dimensional paradigm where (i) on one hand during training they behave like (sensory-systems) agents because they learn to predict outcomes and “experience” the effect of their prediction; but (ii) during inference/prediction they generally do not receive feedback. As you point out, all of this speculation may be moot as many are moving pretty fast towards embedding simulators and giving them memory etc.
What is your opinion on this idea of “loosening up” our definition of agents? I spoke to Max Tegmark a few weeks ago and my position is that we might be thinking of organisms from a time-chauvinist position—where we require the loop to be closed in a fast fashion (e.g. 1sec for most biological organisms).
Thanks for the comment Erik (and taking the time to read the post).
I generally agree with you re: the inner/outer alignment comment I made. But the language I used and that others also use continues to be vague; the working def for inner-alignment on lesswrong.com is whether an “optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned”. I see little difference—but I could be persuaded otherwise.
My post was meant to show that it’s pretty easy to find significant holes in some of the most central concepts researched now. This includes eclectic, but also mainstream research including the entire latent-knowledge approach which seems to make significant assumptions about the relationship between human decision making or intent and super-human AGIs. I work a lot on this concept and hold (perhaps too) many opinions.
The tone might not have been ideal due to time limits. Sorry if that was off putting.
I was also trying to make the point that we do not spend enough time shopping our ideas around with especially basic science researchers before we launch our work. I am a bit guilty of this. And I worry a lot that I’m actually contributing to capabilities research rather than long-term AI-safety. I guess in the end I hope for a way for AI-safety and science researchers to interact more easily and develop ideas together.
Thanks for the comment. Indeed, if we could agree on capping, or slowing down, that would be a promising approach.
Thank you so much for this effectiveness focused post. I thought I would add another perspective, namely “against the lone wolf” approach, i.e. that AI-safety will come down to one person, or a few persons, or an elite group of engineers somewhere. I agree for now there are some individuals who are doing more conceptual AI-framing than others, but in my view I am “shocked that everyone’s dropping the ball” by putting up walls and saying that general public is not helpful. Yes, they might not be helpful now, but we need to work on this!… Maybe someone with the right skill will come along :)
I also view academia as almost hopeless (it’s where I work). But it feels that if a few of us can get some stable jobs/positions/funding—we can start being politically active within academia and the return on investment there could be tremendous.
Hi Chin. Thanks for writing this review, it seems like a well-needed and timed article—at least from my perspective as I was looking for something like this. In particular, I’m trying to frame my research interest relative to AI-safety field, but as you point out this is still too early.
I am wondering if you have any more insights for how you came up with your diagram above? In particular, are there any more peer-reviewed articles, or arXiv papers like Amodei et al (https://arxiv.org/abs/1606.06565) that you relied on? For example, I don’t understand why seed AI is such a critical concept in AI literature (is it even published), as it seems related to the concept of viruses which are an entire field in CS. Also, why is brain-inspired AI a category in your diagram, as far as I know that story isn’t published/peer reviewed or have signifcant traction?
I imagine I’m in the same place you were before you wrote this article, and I’d love to get some more insight about how you ended up with this layout.
Thank you so much,
Thanks for the reply Jonathan. Indeed I’m also a bit skeptical that our innate drives (whether the ones from SDT theory or others) are really non-utility maximizing. But in some cases they do appear so.
One possibility is that they were driven to evolve for utility maximization but have now broken off completely and serve some difficult-to-understand purpose. I think there are similar theories of how consciousness developed—i.e. that it evolved as a by-effect/side-effect of some inter-organism communication—and now plays many other roles.
First of all, thank you so much for reading and taking the time to respond.
I don’t have the time—or knowledge—to respond to everything, but from your response, I worry that my article partially missed the target. I’m trying to argue that humans may not be just—utility—maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there’s no real utility for some or perhaps the most important things that we value. Seeking out “surprising” results does help AIs and humans learn, and seeking out information as well. But I’m not sure human psychology supports human intrinsic rewards as necessarily related to utility maximization. I do view survival and procreation as genetically encoded drives—but they are not the innate drives I described above. It’s not completely clear what we gain when we enjoy being in the world, learning, socializing.
I’m aware of Friston’s free energy principle (it was one of the first things I looked at in graduate school). I personally view most of it as non-falsifiable, but I know that many have used to derive useful interpretation of brain function.
Also I quickly googled LeCun’s proposal, and his conception of future AI, and his intrinsic motivation module is largely about boot-strapped goals—albeit human pro-social ones.
The ultimate goal of the agent is minimize the intrinsic cost over the long run. This is where basic behavioral drives and intrinsic motivations reside. The design of the intrinsic cost module determines the nature of the agent’s behavior. Basic drives can be hard-wired in this module. This may include feeling \good” (low energy) when standing up to motivate a legged robot to walk, when influencing the state of the world to motivate agency, when interacting with humans to motivate social behavior, when perceiving joy in nearby humans to motivate empathy, when having a full energy supplies (hunger/satiety), when experiencing a new situation to motivate curiosity and exploration, when fulfilling a particular program, etc
I would say that my question—which I did not answer in the post—is whether we can design AIs that don’t seek to maximize some utility or minimize some cost? What would that look like? Some computer-cluster just spinning up to do computations for no effective purpose?
I don’t really have an answer here.
Thanks Nathan. I understand that most people working on technical AI-safety research focus on this specific problem, namely of aligning AI—and less on misuse. I don’t expect a large ai-misuse audience here.
Your response—that “truly-aligned-AI” would not change human intent—was also suggested by other AI researchers. But this doesn’t address the problem: human intent is created from (and dependent on) societal structures. Perhaps I failed to make this clearer. But I was trying to suggest we lack an understanding of the genesis of human actions/intentions or goals—and thus cannot properly specify how human intent is constructed—and how to protect it from interference/manipulation. A world imbued with AI-techs will change the societal landscape significantly and potentially for the worse. I think that many view human “intention” as a property of humans that acts on the world and is somehow isolated or protected from the physical and cultural world (see Fig 1a). But the opposite is actually true: in humans intent and goals are likely caused significantly more by society than biology.
The optimist statement: The best way I can interpret “truly-aligned-AI won’t change human agency” is to say that “AI” will—help humans—solve the free will problem and will then “work with us” to redesign what human goals should be. But this later statement is a very tall-order (a United Nations statement that perhaps will never see the light of day...).
Great post Peter. I think a lot about whether it even makes sense to use the term “aligned AGI” as powerfull AGIs may break human intention for a number of reasons (https://www.lesswrong.com/posts/3broJA5XpBwDbjsYb/agency-engineering-is-ai-alignment-to-human-intent-enough).
I see you didn’t refer to AIs become self driven (as in Omohundro: https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf). Is there a reason you don’t view this as part of the college kid problem?
Thanks for sharing! If I had a penny for every article that—in hindsight—would have taken me 10% of the time/effort to write … lol
Thanks for the great post. 2-meta questions.
How long did it take you to write this? I work in academia and am curious to know how such a piece of writing relates to writing an opinion piece on my planet.
Is there a video and/or Q&A at some point (forgive me if I missed it).
Hi TAG, indeed, the post was missing some clarifications. I added a bit more about free will to the text, I hope it’s helpful.
Hi Charlie. Thanks for the welcome!
Indeed, I think that’s a great way to put it “preserving human agency around powerful systems” (I added it to the article). Thanks for that! I am pessimistic that this is possible (or that the question makes sense as it stands). I guess what I tried to do above—was a soft argument that “intent-aligned AIs” might not make sense without further limits or boundaries on both human intent and what AIs can do.
I agree hard wiring is probably not the best solution. However, humans are probably hardwired with a bunch of tools. Post-behaviorist psychology, e.g. self-determination-theory, argues that agency and belonging (for social organisms) are hard wired in most complex organisms (i.e. not learned). (I assume you know some/a lot about this, here’s a very rough write up https://practicalpie.com/self-determination-theory/).
Thanks shminux. My apologies for the confusion, part of my point was that we don’t have consensus on whether we have free will (the professional philosophers usually fall into ~60% compatibilists; but the sociologists have a different conception altogether; and the physicists etc.). I think this got lost because I was not trying to explain the philosophical position on free will. [I have added a very brief note in the main text to clarify what I think of as the “free will problem”].
The rest of the post was an attempt to argue that because human action is likely caused by many factors, AI techs will likely help us uncover the role of these factors; and that AI-misuse or even AI-aligned agents may act to change human will/intent etc.
Re: whether bacteria/fish/cats etc have free will: I propose we’ll have better answers soon (if you consider AGI is coming soon-ish). Or more precisely, the huge body of philosophical, psychological and sociological ideas will be tested against some empirical findings in this field. I actually work on these type of questions from the empirical side (neuroscience and open behavior). And—to summarize this entire post in 1 sentence—I am concerned that most organisms including humans have quite predictable behaviors (given behavior + other internal state data) and that these entire causal networks will be constantly under pressure by both nefarious—but also well-meaning agents like (aligned-AIs) because of the inherently complex nature of behavior.