Computing scientist and Systems architect. Currently doing self-funded AGI safety research.
Koen.Holtman(Koen Holtman)
I guess I got that impression from the ‘public good producers significantly accelerate the development of AGI’ in the title, and then looking at the impactcerts website.
I somehow overlooked the bit where you state that you are also wondering if that would be a good idea.
To be clear: my sense of the current AI open source space is that it definitely under-produces certain software components, software components that could be relevant for improving AI/AGI safety.
If I am reading you correctly, you are trying to build an incentive structure that will accelerate the development of AGI. Many alignment researchers (I am one) will tell you that this is not a good idea, instead you want to build an incentive structure that will accelerate the development of safety systems and alignment methods for AI and AGI.
There is a lot of open source production in the AI world, but you are right in speculating that a lot of AI code and know-how is never open sourced. Take a look at the self-driving car R&D landscape if you want to see this in action.
As already mentioned by Zac, for-profit companies release useful open source all the time for many self-interested reasons.
One reason not yet mentioned by Zac is that an open source release may be a direct attack to suck the oxygen our of the business model of one or more competitors, an attack which aims to commoditize the secret sauce (the software functions and know-how) that the competitor relies on to maintain profitability.
This motivation explains why Facebook started to release big data handling software and open source AI frameworks: they were attacking Google’s stated long-term business strategy, which relied on Google being better at big data and AI than anybody else. To make this more complicated, Google’s market power never relied as much on big data and advanced AI as it wanted its late-stage investors to believe, so the whole move has been somewhat of an investor story telling shadow war.
Personally, I am not a big fan of the idea that one might try to leverage crypto-based markets as a way to improve on this resource allocation mess.
First, a remark on Holden’s writeup. I wrote above that Several EA funding managers are on record as wanting to fund pre-paradigmatic research, From his writeup, I am not entirely sure if Holden is one of them, the word ‘paradigmatic’ does not appear in it. But it is definately clear that Holden is not very happy with the current paradigm of AI research, in the Kuhnian sense where a paradigm is more than just a scientific method but a whole value system supported by a dominant tribe.
To quote a bit of Wikipedia:
Kuhn acknowledges having used the term “paradigm” in two different meanings. In the first one, “paradigm” designates what the members of a certain scientific community have in common, that is to say, the whole of techniques, patents and values shared by the members of the community. In the second sense, the paradigm is a single element of a whole, say for instance Newton’s Principia, which, acting as a common model or an example… stands for the explicit rules and thus defines a coherent tradition of investigation.
Now, Holden writes under misconception 1:
I think there are very few people making focused attempts at progress on the below questions. Many institutions that are widely believed to be interested in these questions [of AI alignment in the EA/longtermist sense] have constraints, cultures and/or styles that I think make it impractical to tackle the most important versions of them [...]
Holden here expresses worry about a lack of incentives to tackle the right questions, not about these institutions even lacking the right scientific tools to make any progress if they wanted to. So Holden’s concern here is somewhat orthogonal to the ‘pre-paradigmatic’ narratives associated with MIRI and John Wentworth, which is that these institutions are not even using the right tools.
That being said, Holden has written a lot. I am only commenting here about a tiny part of one single post.
On your examples of Paradigmatic alignment research vs. Pre-paradigmatic alignment research: I agree with your paradigmatic examples being paradigmatic, because they have the strength of the tribe of ML researchers behind them. (A few years back, dataset bias was still considered a somewhat strange and career-killing topic to work on if you wanted to be an ML researcher, but this has changed, if I judge this by the most recent NeurIPS conference.)
The pre-paradigmatic examples you mention do not have the ML research tribe behind them, but in a Kuhnian sense they are in fact paradigmatic inside the EA/rationalist tribe. So I might still call them paradigmatic, just in a different tribe.
My concern is that while the list two problems are more fuzzy and less well-defined, they are far less direcetly if at all (in 2) actually working on the problem we actually care about.
..I am confused here, you meant to write ‘first two problems’ above? I can’t really decode your concern.
To do this, we’ll start by offering alignment as a service for more limited AIs.
Interesting move! Will be interesting to see how you will end up packaging and positioning this alignment as a service, compared to the services offered by more general IT consulting companies. Good luck!
A comment on your list of questions after reading the whole sequence: unlike John and Tekhne elsewhere in this comment thread, I am pretty comfortable with the hierarchical list of questions you are developing here.
This is a pretty useful set of questions that could be taken as starting points for all kinds of useful paradigmatic research.
I believe that part of John’s lack of comfort with the above list of questions is caused by a certain speculative assumption he makes about AGI alignment, an assumption that is also made by many in MIRI, an assumption popular on this forum. The assumption is that, in order to solve AGI alignment, we first need to have nothing less than a complete scientific and philosophical revolution, a revolution that will make all current paradigms entirely obsolete.
If you believe in that speculative assumption, then your above step of already asking specific questions about AGI would be premature. It distracts from having a scientific revolution first.
John’s speculative assumption is itself of course just another paradigm in the Kunhnian sense. It corresponds to a school of thought which says that AGI safety research must be about inventing entirely new paradigms, as opposed to, say, exploring how existing paradigms taken from many existing disciplines might be applied to the problem.
Myself, I am of the school that sees more value in exploring and combining existing paradigms. I think that approach is more likely to end up with actionable solutions for managing AGI safety risks. That being said, I think all here would agree that both schools could potentially come up with something valuable.
Just want to say: I read the whole sequence and enjoyed reading it.
An intriguing and neglected direction for control proposal research concerns endogenous control—i.e., self-control.
Agree. To frame this in paradigm-language: most of the discussion on this forum, both arguments about AI/AGI dangers and plans that consider possible solutions, uses paradigm A:
Paradigm A: We treat the AGI as a spherical econ with an unknown and opaque internal structure, which was set up to maximise a reward function/reward signal.
But there is also
Paradigm B: We treat the AGI as a computer program with an internal motivation and structure that we can control utterly, because we are writing it.
This second paradigm leads to AGI safety research like my Creating AGI Safety Interlocks or the work by Cohen et al here.
Most ‘mainstream’ ML researchers, and definitely most robotics researchers, are working under paradigm B. This explains some of the disconnect between this forum and mainstream theoretical and applied AI/AI safety research.
I like you writing about this: the policy problem is not mentioned often enough on this forum. Agree that it needs to be part of AGI safety research.
I have no deep insights to add, just a few high level remarks:
to pass laws internationally that make it illegal to operate or supervise an AGI that is not properly equipped with the relevant control mechanisms. I think this proposal is necessary but insufficient. The biggest problem with it is that it is totally unenforceable.
I feel that the ‘totally unenforceable’ meme is very dangerous—it is too often used as an excuse by people who are looking for reasons to stay out of the policy game. I also feel that your comments further down in the post in fact contradict this ‘totally unenforceable’.
Presuming that AGI is ultimately instantiated as a fancy program written in machine code, actually ensuring that no individual is running ‘unregulated’ code on their machine would require oversight measures draconian enough to render them logistically and politically inconceivable, especially in Western democracies.
You mean, exactly like how the oversight measures against making unregulated copies of particular strings of bits, in order to protect the business model of the music industry and Hollywood, was politically inconceivable in the time period from the 1980s till now, especially in Western democracies? We can argue about how effective this oversight has been, but many things are politically conceivable.
My last high-level remark is that there is a lot of AI policy research, and some of it is also applicable to AGI and x-risk. However, it is very rare to see AI policy researchers post on this forum.
I’m interested to see where you will take this.
A terminology comment: as part of your classification system. you are calling ‘supervised learning’ and ‘reinforcement learning’ two different AI/AGI ‘learning algorithm architectures’. This takes some time for me to get used to. It is more common in AI to say that SL and RL solve two different problems, are different types of AI.
The more common framing would be to say that an RL system is fundamentally an example of an an autonomous agent type AI, and an SL system is fundamentally an example of an input classifier or answer predictor type AI. Both types can in theory be built without any machine learning algorithm inside, in fact early AI research produced many such intelligent systems without any machine learning algorithm inside at all.
An example of a machine learning architecture, on the other hand, would be something like a deep neural net with backpropagation. This type of learning algorithm might be used to build both an SL system and an RL system.
In Barto’s work that you reference, he writes that
Both reinforcement learning and supervised learning are statistical processes in which a general function is learned from samples.
I usually think of a ‘machine learning algorithm/architecture’ as being a particular method to learn a general function from samples. Where the samples come from, and how the learned function is then used, depends on other parts of the ‘AI architecture’, the non-ML-algorithm parts.
So where you write ‘Predicted architecture of AGI learning algorithm(s)‘, I would tend to write ‘predicted type of AGI system being used’.
I like your section 2. As you are asking for feedback on your plans in section 3:
By default I plan to continue looking into the directions in section 3.1, namely transparency of current models and its (potential) intersection with developments in deep learning theory. [...] Since this is what I plan to do, it’d be useful for me to know if it seems totally misguided
I see two ways to improve AI transparency in the face of opaque learned models:
-
try to make the learned models less opaque—this is your direction
-
try to find ways to build more transparent systems that use potentially opaque learned models as building blocks. This is a research direction that your picture of a “human-like ML model” points to. Creating this type of transparency is also one of the main thoughts behind Drexler’s CAIS. You can also find this approach of ‘more aligned architectures built out of opaque learned models’ in my work, e.g. here.
Now, I am doing alignment research in part because of plain intellectual curiosity.
But an argument could be made that, if you want to be maximally effective in AI alignment and minimising x-risk, you need to do either technical work to improve systems of type 2, or policy work on banning systems which are completely opaque inside, banning their use in any type of high-impact application. Part of that argument would also be that mainstream ML research is already plenty interested in improving the transparency of current generation neural nets, but without really getting there yet.
-
I like what you are saying above, but I also think there is a deeper story about paradigms and EA that you are not yet touching on.
I am an alignment researcher, but not an EA. I read quite broadly about alignment research, specifically I also read beyond the filter bubble of EA and this forum. What I notice is that many authors, both inside and outside of EA, observe that the field needs more research and more fresh ideas.
However, the claim that the field as a whole is ‘pre-paradigmatic’ is a framing that I see only on the EA and Rationalist side.
To make this more specific: I encounter this we-are-all-pre-paradigmatic narrative almost exclusively on the LW/AF forums, and on the EA forum (I only dip into the EA forum it occasionally, as I am not an EA). I see it this narrative also in EA-created research agendas and introductory courses, for example in the AGI safety fundamentals curriculum.
My working thesis is that talk about being pre-paradigmatic tells us more about the fundamental nature of EA than it tells us about the fundamental nature of the AI alignment problem.
There are in fact many post-paradigmatic posts about AI alignment on this forum. I wrote some of them myself. What I mean is posts where the authors select some paradigm and then use it to design an actual AGI alignment mechanism. These results-based-on-a-paradigm posts are seldom massively upvoted. Massive upvoting does however happen for posts which are all about being pre-paradigmatic, or about walking the first tentative steps using a new paradigm. I feel that this tells us more about the nature of EA and Rationalism as movements than it tells us about the nature of the alignment problem.
Several EA funding managers are on record as wanting to fund pre-paradigmatic research. The danger of this of course is that it creates a great incentive for the EA-funded alignment researchers to never to become post-paradigmatic.
I believe this pre-paradigmatic stance also couples to the reluctance among many EAs to ever think about politics, to make actual policy proposals, or to investigate what it would take to get a policy proposal accepted.
There is an extreme type of pre-paradigmatic stance, which I also encounter on this forum. In this extreme stance, you do not only want more paradigms, but you also reject all already-existing paradigms as being fundamentally flawed, as not even close to being able to capture any truth. This rejection implies that you do not need to examine any of the policy proposals that might flow out of of any existing paradigmatic research. Which is convenient if you want to avoid thinking about policy. It also means you do not need to read other people’s research. Which can be convenient too.
If EA were to become post-paradigmatic, and then start to consider making actual policy proposals, this might split the community along various political fault lines, and it might upset many potential wealthy donors to boot. If you care about the size and funding level of the community, it is very convenient to remain in a pre-paradigmatic state, and to have people tell you that it is rational to be in that state.
I am not saying that EA is doomed to be ineffective. But I do feel that any alignment researcher who wants to be effective needs to be aware of the above forces that push them away from becoming paradigmatic, so that they can overcome these forces.
A few years back, I saw less talk about everybody being in a pre-paradigmatic state on this forum, and I was was feeling a vibe that was more encouraging to anybody who had a new idea. It may have been just me feeling that different vibe, though.
Based on my working thesis above, there is a deeper story about EA and paradigms to be researched and written, but it probably needs an EA to write it.
Honest confession: often when I get stuck doing actual paradigmatic AI alignment research, I feel an impulse to research and write well-researched meta-stories about the state of alignment field. At the same time, I feel that there is already an over-investment in people writing meta-stories, especially now that we have books like The alignment problem. So I usually manage to suppress my impulse to write well-researched meta-stories, sometimes by posting less fully researched meta-comments like this one.
instrumental convergence basically disappears for agents with utility functions over action-observation histories.
Wait, I am puzzled. Have you just completely changed your mind about the preconditions needed to get a power-seeking agent? The way the above reads is: just add some observation of actions to your realistic utility function, and you instrumental convergence problem is solved.
-
u-AOH (utility functions over action-observation histories): No IC
-
u-OH (utility functions over observation histories): Strong IC
There are many utility functions in u-AOH that simply ignore the A part of the history, so these would then have Strong IC because they are u-OH functions. So are you are making a subtle mathematical point about how these will average away to zero (given various properties of infinite sets), or am I missing something?
-
I am not familiar with the specific rationalist theory of AGI developed in the high rationalist era of the early 2010s. I am not a rationalist, but I do like histories of ideas, so I am delighted to learn that such a thing as the high rationalist era of the early 2010s even exists.
If I were to learn more about the actual theory, I suspect that you and I would end up agreeing that the rationalist theory of AGI developed in the high rationalist era was crankish.
It is your opinion that despite the expenditure of a lot of effort, no specific laws of AGI have been found. This opinion is common on this forum, it puts you in what could be called the ‘pre-paradigmatic’ camp.
My opinion is that the laws of AGI are the general laws of any form of computation (that we can physically implement), with some extreme values filled in. See my original comment. Plenty of useful work has been done based on this paradigm.
In physics, we can try to reason about black holes and the big bang by inserting extreme values into the equations we know as the laws of physics, laws we got from observing less extreme phenomena. Would this also be ‘a fictional-world-building exercise’ to you?
Reasoning about AGI is similar to reasoning about black holes: both of these do not necessarily lead to pseudo-science, though both also attract a lot of fringe thinkers, and not all of them think robustly all of the time.
In the AGI case, the extreme value math can be somewhat trivial, if you want it. One approach is to just take the optimal policy defined by a normal MDP model, and assume that the AGI has found it and is using it. If so, what unsafe phenomena might we predict? What mechanisms could we build to suppress these?
the Embedded Agency post often mentioned as a good introductory material into AI Alignment.
For the record: I feel that Embedded Agency is a horrible introduction to AI alignment. But my opinion is a minority opinion on this forum.
There is a huge diversity in posts on AI alignment on this forum. I’d agree that some of them are pseudo-scientific, but many more posts fall in one of the following categories:
-
authors follow the scientific method of some discipline, or use multidisciplinary methods,
-
authors admit outright that they are in a somewhat pre-scientific state, i.e. they do not have a method/paradigm yet that they have any confidence in, or
-
authors are talking about their gut feelings of what might be true, and again freely admit this
Arguably, posts of type 2 and 3 above are not scientific, but as they do not pretend to be, we can hardly call them pseudo-scientific.
That being said, this forum is arguably a community, but its participants do not cohere into anything as self-consistent as a single scientific or even pseudo-scientific field.
In a scientific or pseudo-scientific field, the participants would at least agree somewhat on what the basic questions and methods are, and would agree somewhat on which main questions are open and which have been closed. On this forum, there is no such agreement. Notably, there are plenty of people here who make a big deal out of distrusting not just their own paradigms, but also those used by everybody else, including of course those used by ‘mainstream’ AI research.
If there is any internally coherent field this forum resembles, it is the field of philosophy, where you can score points by claiming to have a superior lack of knowledge, compared to all these other deep thinkers.
-
I agree with your general comments, and I’d like to add some additional observations of my own.
Reading the paper Reward is Enough, what strikes me most is that the paper is reductionist almost to the point of being a self-parody.
Take a sentence like:
The reward-is-enough hypothesis postulates that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment.
I could rewrite this to
The physics-is-enough hypothesis postulates that intelligence, and its associated abilities, can be understood as being the laws of physics acting in an environment.
If I do that rewriting throughout the paper, I do not have to change any of the supporting arguments put forward by the authors: they equally support the physics-is-enough reductionist hypothesis.
The authors of ‘reward is enough’ posit that rewards explain everything, so you might think that they would be very interested in spending more time to look closely at the internal structure of actual reward signals that exist in the wild, or actual reward signals that might be designed. However, they are deeply uninterested in this. In fact they explicitly invite others to join them in solving the ‘challenge of sample-efficient reinforcement learning’ without ever doing such things.
Like you I feel that, when it comes to AI safety, this lack of interest in the details of reward signals is not very helpful. I like the multi-objective approach (see my comments here), but my own recent work like this has been more about abandoning the scalar reward hypothesis/paradigm even further, about building useful models of aligned intelligence which do not depend purely on the idea of reward maximisation. In that recent paper (mostly in section 7) I also develop some thoughts about why most ML researchers seem so interested in the problem of designing reward signals.
Any thoughts on how to encourage a healthier dynamic.
I have no easy solution to offer, except for the obvious comment that the world is bigger than this forum.
My own stance is to treat the over-production of posts of type 1 above as just one of these inevitable things that will happen in the modern media landscape. There is some value to these posts, but after you have read about 20 of them, you can be pretty sure about how the next one will go.
So I try to focus my energy, as a reader and writer, on work of type 2 instead. I treat arXiv as my main publication venue, but I do spend some energy cross-posting my work of type 2 here. I hope that it will inspire others, or at least counter-balance some of the type 1 work.
Good question. I don’t have a list, just a general sense of the situation. Making a list would be a research project in itself. Also, different people here would give you different answers. That being said,
I occasionally see comments from alignment research orgs who do actual software experiments that they spend a lot of time on just building and maintaining the infrastructure to run large scale experiments. You’d have to talk to actual orgs to ask them what they would need most. I’m currently a more theoretical alignment researcher, so I cannot offer up-to-date actionable insights here.
As a theoretical researcher, I do reflect on what useful roads are not being taken, by industry and academia. One observation here is that there is an under-investment in public high-quality datasets for testing and training, and in the (publicly available) tools needed for dataset preparation and quality assurance. I am not the only one making that observation, see for example https://research.google/pubs/pub49953/ . Another observation is that everybody is working on open source ML algorithms, but almost nobody is working on open source reward functions that try to capture the actual complex details of human needs, laws, or morality. Also, where is the open source aligned content recommender?
On a more practical note, AI benchmarks have turned out to be a good mechanism for drawing attention to certain problems. Many feel that this benchmarks are having a bad influence on the field of AI, I have a lot of sympathy for that view, but you might also go with the flow. A (crypto) market that rewards progress on selected alignment benchmarks may be a thing that has value. You can think here of benchmarks that reward cooperative behaviour, truthfulness and morality in answers given by natural language querying systems, playing games ethically ( https://arxiv.org/pdf/2110.13136.pdf ), etc. My preference would be to reward benchmark contributions that win by building strong priors into the AI to guide and channel machine learning; many ML researchers would consider this to be cheating, but these are supposed to be alignment benchmarks, not machine-learning-from-blank-slate benchmarks. I have some doubts about the benchmarks for fairness in ML which are becoming popular, if I look at the latest NeurIPS: the ones I have seen offer tests which look a bit too easy, if the objective is to reward progress on techniques that have the promise of scaling up to more complex notions of fairness and morality you would like to have at the AGI level, or even for something like a simple content recommendation AI. Some cooperative behaviour benchmarks also strike me as being too simple, in their problem statements and mechanics, to reward the type of research that I would like to see. Generally, you would want to retire a benchmark from the rewards-generating market when the improvements on the score level out.