Senior Researcher, Convergence Analysis.
Associate Professor Affiliate, University of Washington
Senior Researcher, Convergence Analysis.
Associate Professor Affiliate, University of Washington
My name is Justin Bullock. I live in the Seattle area after 27 years in Georgia and 7 years in Texas. I have a PhD and Public Administration and Policy Analysis where I focused on decision making within complex, hierarchical, public programs. For example, in my dissertation I attempted to model how errors (measured as improper payments) are built into the US Unemployment Insurance Program. I spent time looking at how agents are motivated within these complex systems trying to develop general insights into how errors occur in these systems. Until about 2016, I was very much ignorant of the discussions around AI. I was introduced to the arguments around AGI and alignment through the work PR works of Sam Harris and Max Tegmark leading me eventually to the work of Nick Bostrom and Eliezer Yudkowsky. It’s been a wild and exciting ride.
I currently have a tenured Associate Professor position at Texas A&M University that I’m resigning on July 1 to focus more on writing, creating, and learning without all of the weird pressures and incentives that come from working within a major public research university in the social sciences. In preparation for changing my employment status, I’ve been considering the communities I want to be in discussion with and the LessWrong and AlignmentForum communities are among the most interesting on that list.
My writing is on decision making, agents, communication, governance and control of complex systems, and how AI and future AGI influences these things. I’ve been thinking about the issue of control of multi-agent systems a lot lately and what types of systems of control can be used to guide or build robust agent-agnostic processes of AI and human constitution. In agreement with George Dyson’s recent arguments, I also worry that we have already lost meaningful human control over the internet. Finally, I’ve recently been significantly influenced by the works of Olaf Stapledon (Star Maker, Last and First Men, Sirius) and Aldous Huxley (The Perennial Philosophy) in thinking more carefully about the mind/body problem, the endowment of the cosmos, and the nature of reality.
My hope is that I can learn from you all and bring to this conversation thoughts on alignment, control, governance (in particular of multi-agent systems that contain only humans, humans and AI, and only AI), and form a map together that better reflects the territory . I look forward to engaging with the community!
Thank you for this comment!
I think your point that “The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there).” is spot on and maps to my intuitions about the weaknesses of fine-tuning and one of strongest points in favor of the significant risks to open-sourcing foundation models.
I appreciate your suggestions for other methods of auditing that could possibly work such as a model being run within a protected framework and open-sourcing encrypted weights. I think these allow for something like risk mitigations for partial open-sourcing but would be less feasible for fully open sourced models where weights represented by plain tensors would be more likely to be available
Your comment is helpful and gave me some additional ideas to consider. Thanks!
One thing I would add is that the idea I had in mind for auditing was more of a broader process than a specific tool. The paper I mention to support this idea of a healthy ecosystem for auditing foundation models is “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing.” Here the authors point to an auditing process that would guide a decision of whether or not to release a specific model and the types of decision points, stakeholders, and review process that might aid in making this decision. At the most abstract level the process includes scoping, mapping, artifact collection, testing, reflection, and post-audit decisions of whether or not to release the model.
Thanks for this post. As I mentioned to both of you, it feels a little bit like we have been ships passing one another in the night. I really like your idea here of loops and the importance of keeping humans within these loops, particularly at key nodes in the loop or system, to keep Moloch at bay.
I have a couple scattered points for you to consider:
In my work in this direction, I’ve tried to distinguish between roles and tasks. You do something similar here, which I like. To me, the question often should be about what specific tasks should be automated as opposed to what roles. As you suggest, people within specific roles bring their humanity with them to the role. (See: “Artificial Intelligence, Discretion, and Bureaucracy”)
One term I’ve used to help think about this within the context of organizations is the notion of discretion. This is the way in which individuals use of their decision making capacity within a defined role. It is this discretion that often allows individuals holding those roles to shape their decision making in a humane and contextualized way. (See: “Artificial discretion as a tool of governance: a framework for understanding the impact of artificial intelligence on public administration”)
Elsewhere, coauthors and I have used the term administrative evil to examine the ways in which substituting machine decision making for human decision making dehumanizes the decision making process exacerbating the risk of administrative evil be perpetuated by an organization. (See: Artificial Intelligence and Administrative Evil”)
One other line of work has looked at how the introduction of algorithms or machine intelligence within the loop changes the shape of the loop, potentially in unexpected ways, leading to changes in inputs in decision making throughout the loop. That is machine evolution influences organization (loop) evolution. (See: Machine Intelligence, Bureaucracy, and Human Control” & “Artificial Intelligence, bureaucratic form, and discretion in public service”)
I like the inclusion of the work on Cyborgism. It seems to me that in someways we’ve already become Cyborgs to match the complexity of the loops in which we work and play together. as they’ve already evolved in response to machine evolution. In theory at least, it does seem that a Cyborg approach could help overcome some of the challenges presented by Moloch and failed attempts at coordination.
Finally, your focus on loops reminded me of “Godel, Escher, Bach” and Hofstadter’s focus there and in his “I am A Strange Loop.” I like how you apply the notion to human organizations here. It would be interesting to think about different types of persistent loops as a ways of describing different organizational structures, goals, resources, etc.
I’m hoping we can discuss together sometime soon. I think we have a lot of interest overlap here.
Thanks for this post! Hope the comments are helpful.
Thank you for providing a nice overview of our Frontier AI Regulation: Managing Emerging Risks to Public Safety that was just released!
I appreciate your feedback, both the positive and critical parts. I’m also glad you think the paper should exist and that it is mostly a good step. And, I think your criticism is fair. Let me also note that I do not speak for the authorship team. We are quite a diverse group from academia, labs, industry, nonprofits, etc. It was no easy task to find common ground across everyone involved.
I think the AI Governance space is difficult in part because different political actors have different goals, even when sharing significant overlap in interests. As I saw it, the goal of this paper was to bring together a wide group of interested individuals and organizations to see if we could come to points of agreement on useful immediate next governance steps. In this way, we weren’t seeking “ambitious” new policy tools, we were seeking for areas of agreement across the diverse stakeholders currently driving change in the AI development space. I think this is a significantly different goal than the Model Evaluation for Extreme Risks paper that you mention, which I agree is another important entry in this space. Additionally, one of the big differences, I think, between our effort and the model evaluation paper, is we are more focused on what governments in particular should consider doing from their available toolkits, where it seems to me that model evaluation paper is more about what companies and labs themselves should do.
A couple of other thoughts:
I don’t think it’s completely accurate that “It doesn’t suggest government oversight of training runs or compute.” As part of the suggestion around licensing we mention that the AI development process may require oversight by an agency. But, in fairness, it’s not a point that we emphasize.
I think the following is a little unfair. You say: “This is overdeterminedly insufficient for safety. “Not complying with mandated standards and ignoring repeated explicit instructions from a regulator” should not be allowed to happen, because it might kill everyone. A single instance of noncompliance should not be allowed to happen, and requires something like oversight of training runs to prevent. Not to mention that denying market access or threatening prosecution are inadequate. Not to mention that naming-and-shaming and fining companies are totally inadequate. This passage totally fails to treat AI as a major risk. I know the authors are pretty worried about x-risk; I notice I’m confused.” Let me explain below.
I’m not sure there’s such a thing as “perfect compliance.” I know of no way to ensure that “a single instance of noncompliance should not be allowed to happen.” And, I don’t think that’s necessary for current models or even very near term future models. I think the idea here is that we setup a standard regulatory process in advance of AI models that might be capable enough to kill everyone and shape the development of the next sets of frontier models. I do think there’s certainly a criticism here that naming and shaming, for example, is not a sufficiently punitive tool, but may have more impact on leading AI labs that one might assume.
I hope this helps clear up some of your confusion here. To recap: I think your criticism that the tools are not ambitious is fair. I don’t think that was our goal. I saw this project as a way of providing tools for which there is broad agreement and that given the current state of AI models we believe would help steer AI development and deployment in a better direction. I do think that another reading of this paper is that it’s quite significant that this group agreed on the recommendations that are made. I consider it progress in the discussion of how to effectively govern increasingly power AI models, but it’s not the last word either. :)
Thanks again for sharing and for providing you feedback on these very important questions of governance.
Thank you for the comment and for reading the sequence! I posted Chapter 7 Welcome to Analogia! (https://www.lesswrong.com/posts/PKeAzkKnbuwQeuGtJ/welcome-to-analogia-chapter-7) yesterday and updated the main sequence page just now to reflect that. I think this post starts to shed some light on ways of navigating this world of aligning humans to the interests of algorithms, but I doubt it will fully satisfy your desire for a call to action.
I think there are both macro policies and micro choices that can help.
At the macro level, there is an over accumulation of power and property by non-human intelligences (machine intelligences, large organizations, and mass market production). The best guiding remedy here that I’ve found comes from Huxley. The idea is pretty straightforward in theory: spread the power and property around and in the direction away from these non-human intelligences and towards as many humans as possible. This seems to be the only reasonable cure to the organized lovelessness and its consequence of massive dehumanization.
At the micro level, there is some practical advice in Chapter 7 that also originates with Huxley. The suggestion here is that to avoid being an algorithmically aligned human, choose to live filled with love, intelligence, and freedom as your guide posts. Pragmatically, one must live in the present, here and now, to experience those things fully.
I hope this helps, but I’m not sure it will.
The final thing I’d add at this point is that I think there’s something to reshaping our technological narratives around machine intelligence away from its current extractive and competitive logics and directed more generally towards humanizing and cooperative logics. The Erewhonians from Chapter 7 (found in Samuel Butler’s Erewhon) have a more extreme remedy: stop technological evolution, turn it backwards. But short of global revolution, this seems like proposing that natural evolution should stop.
I’ll be editing these 7 chapters, adding a new introduction and conclusion, and publishing Part I as a standalone book later this year. And as part of that process, I intend to spend more time continuing to think about this.
Thanks again for reading and for the comment!
Thanks for this comment. I agree there is some ambiguity here on the types of risks that are being considered with respect to the question of open-sourcing foundation models. I believe the report favors the term “extreme risks” which is defined as “risk of significant physical harm or disruption to key societal functions.” I believe they avoid the terms of “extinction risk” and “existential risk,” but are implying something not too different with their choice of extreme risks.
For me, I pose the question above as:
“How large are the risks from fully open-sourced foundation models? More specifically, how significant are these risks compared to the overall risks inherent in the development and deployment of foundation models?”
What I’m looking for is something like “total risk” versus “total benefit.” In other words, if we take all the risks together, just how large are they in this context? In part I’m not sure if the more extreme risks really come from open sourcing the models or simply from the development and deployment of increasingly capable foundation models.
I hope this helps clarify!
Thank you for the insights. I agree with your insight that “bureaucracies are notorious homes to Goodhart effects and they have as yet found no way to totally control them.” I also agree with you intuition that “to be fair bureaucracies do manage to achieve a limited level of alignment, and they can use various mechanisms that generate more vs. less alignment.”
I do however believe that an ideal type of bureaucratic structure helps with at least some forms of the alignment problem. If for example, Drexler is right, and my conceptualization of the theory is right (CAIS) expects a slow takeoff of increasing intelligent narrow AIs that work together on different components of intelligence or completing intelligent tasks. In this case, I think Weber’s suggestions both of how to create generally controllable intelligent agents (Beamte) and his ideas on constraining individual agents authority to certain tasks who are then nominated to higher tasks by those with more authority (weight, success, tenure, etc) has something helpful to say in the design of narrow agents that might work together towards a common goal.
My thoughts here are still in progress and I’m planning to spend time with these two recent posts in particular to help my understanding:
https://www.lesswrong.com/posts/Fji2nHBaB6SjdSscr/safer-sandboxing-via-collective-separation
One final thing I would add is that I think many of problems with bureaucracies can often be characterized around limits of information and communication (and how agents are trained and how they are motivated and what are the most practical or useful levels of hierarchy or discretion). I think the growth of increasingly intelligent narrow AIs could (under the right circumstance) drastically limit information and communication problems.
Thanks again for your comment. The feedback is helpful. I hope to make additional posts in the near future to try and further develop these ideas.
There is a growing academic field of “governance” that exists that would variously be described as a branch of political science, public administration, or policy studies. It is a relatively small field, but has several academic journals where that fit the description of the literature you’re looking for. The best of these journals, in my opinion, is Perspectives on Public Management & Governance (although it has a focus on public governance structures to a fault of ignoring corporate governance structures).
In addition to this, there is a 50 chapter OUP AI Governance Handbook that I’ve co-edited with leading scholars from Economics, Political Science, International Affairs, and other fields of social science that are interested in these exact ideal governance questions as you describe them. 10 of the chapters are currently available, but I also have complete copies of essentially every chapter that I would be happy to share directly with you or anyone else that comments here and is interested. Here’s the Table of Contents. I’m certainly biased, but I think this book contains the cutting edge dialogue around both how ideal governance may be applied to controlling AI and how the development of increasingly powerful AI presents new opportunities and challenges for ideal governance.
I have contributed to these questions both from trying to understand what might be the elements of ideal governance structures and processes for Social Insurance programs, AI systems, and Space settlement ideal governance and to understand what are the concerns of integrating autonomous and intelligent decision making systems into our current governance structures and processes.
I think there are some helpful insights into how to make governance adaptive (reset/jubilee as you described it) and for defining the elements of the hierarchy (various levels) of the governance structure. The governance literature looks at micro/meso/macro levels of governance structures to help illustrate how some governance elements are best described and understand at different levels of emergence or description. Another useful construct from governance scholars is that of discretion or breadth of choice set given to an agent carrying out the required decision making of the various governance entities, this is where much of my own interest lies, which you can see in work I have with colleagues on topics including discretion, evolution of bureaucratic form, artificial discretion, administrative evil, and artificial bureaucrats. This work builds on the notion of bounded rational actors and how they execute decisions in response to constitutional rule and insertional structures. Here & here in the AI Governance Handbook, with colleagues we look at how Herbert Simon and Max Weber’s classic answers to these ideal governance questions hold in a world with machine intelligence, and we examine what new governance tools, structures, and processes may be needed now. I’ve also done some very initial work here on the Lesswrong forum looking at how Weber’s ideal bureaucratic structure might be helpful for considering how to control intelligent machine agents
In brief recap, there is a relatively small interdisciplinary field/community of scholars looking at these questions, it is a community that has done some brainstorming, some empirical work, and used some economics-style thinking to address some of these ideal governance questions. There are also some classic works that touch on these topics as well around thinkers such as Max Weber, Herbert Simon, and Elinor Ostrom.
I hope this is helpful. I’m sure I’ve focused too much on my own work here, but I hope the Handbook in particular gives you some sense of some of the work out there. I would be happy to connect you with other writers and thinkers who I believe are taking these questions of ideal governance seriously. I find these to be among the most interesting and important questions for our moment in time.