CEO at Conjecture.
I don’t know how to save the world, but dammit I’m gonna try.
CEO at Conjecture.
I don’t know how to save the world, but dammit I’m gonna try.
“good” always refers to idiosyncratic opinions, I don’t really take moral realism particularly seriously. I think there is “good” philosophy in the same way there are “good” optimization algorithms for neural networks, while also I assume there is no one optimizer that “solves” all neural network problems.
I strongly disagree and do not think that will be how AGI will look, AGI isn’t magic. But this is a crux and I might be wrong of course.
I can’t rehash my entire views on coordination and policy here I’m afraid, but in general, I believe we are currently on a double exponential timeline (though I wouldn’t model it quite like you, but the conclusions are similar enough) and I think some simple to understand and straightforwardly implementable policy (in particular, compute caps) at least will move us to a single exponential timeline.
I’m not sure we can get policy that can stop the single exponential (which is software improvements), but there are some ways, and at least we will then have additional time to work on compounding solutions.
Sure, it’s not a full solution, it just buys us some time, but I think it would be a non-trivial amount, and let not perfect be the enemy of good and what not.
I see regulation as the most likely (and most accessible) avenue that can buy us significant time. The fmpov obvious is just put compute caps in place, make it illegal to do training runs above a certain FLOP level. Other possibilities are strict liability for model developers (developers, not just deployers or users, are held criminally liable for any damage caused by their models), global moratoria, “CERN for AI” and similar. Generally, I endorse the proposals here.
None of these are easy, of course, there is a reason my p(doom) is high.
But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...
Of course if a solution merely looks good, that will indeed be really bad, but that’s the challenge of crafting and enforcing sensible regulation.
I’m not sure I understand why it would be bad if it actually is a solution. If we do, great, p(doom) drops because now we are much closer to making aligned systems that can help us grow the economy, do science, stabilize society etc. Though of course this moves us into a “misuse risk” paradigm, which is also extremely dangerous.
In my view, this is just how things are, there are no good timelines that don’t route through a dangerous misuse period that we have to somehow coordinate well enough to survive. p(doom) might be lower than before, but not by that much, in my view, alas.
I think this is not an unreasonable position, yes. I expect the best way to achieve this would be to make global coordination and epistemology better/more coherent...which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.
One of the ways I can see a “slow takeoff/alignment by default” world still going bad is that in the run-up to takeoff, pseudo-AGIs are used to hypercharge memetic warfare/mutation load to a degree basically every living human is just functionally insane, and then even an aligned AGI can’t (and wouldn’t want to) “undo” that.
Hard for me to make sense of this. What philosophical questions do you think you’ll get clarity on by doing this? What are some examples of people successfully doing this in the past?
The fact you ask this question is interesting to me, because in my view the opposite question is the more natural one to ask: What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is exactly those areas where we can have as much feedback from reality in as tight loops as possible, and so if we are trying to tackle ever more lofty problems, it becomes ever more important to get exactly that feedback wherever we can get it! From my point of view, this is the default of successful human epistemology, and the exception should be viewed with suspicion.
And for what it’s worth, acting in the real world, building a company, raising money, debating people live, building technology, making friends (and enemies), absolutely helped me become far, far less confused, and far more capable of tackling confusing problems! Actually testing my epistemology and rationality against reality, and failing (a lot), has been far more helpful for deconfusing everything from practical decision making skills to my own values than reading/thinking could have ever been in the same time span. There is value in reading and thinking, of course, but I was in a severe “thinking overhang”, and I needed to act in the world to keep learning and improving. I think most people (especially on LW) are in an “action underhang.”
“Why do people do things?” is an empirical question, it’s a thing that exists in external reality, and you need to interact with it to learn more about it. And if you want to tackle even higher level problems, you need to have even more refined feedback. When a physicist wants to understand the fundamentals of reality, they need to set up insane crazy particle accelerators and space telescopes and supercomputers and what not to squeeze bits of evidence out of reality and actually ground whatever theoretical musings they may have been thinking about. So if you want to understand the fundamentals of philosophy and the human condition, by default I expect you are going to need to do the equivalent kind of “squeezing bits out of reality”, by doing hard things such as creating institutions, building novel technology, persuading people, etc. “Building a company” is just one common example of a task that forces you to interact a lot with reality to be good.
Fundamentally, I believe that good philosophy should make you stronger and allow you to make the world better, otherwise, why are you bothering? If you actually “solve metaphilosophy”, I think the way this should end up looking is that you can now do crazy things. You can figure out new forms of science crazy fast, you can persuade billionaires to support you, you can build monumental organizations that last for generations. Or, in reverse, I expect that if you develop methods to do such impressive feats, you will necessarily have to learn deep truths about reality and the human condition, and acquire the skills you will need to tackle a task as heroic as “solving metaphilosophy.”
Everyone dying isn’t the worst thing that could happen. I think from a selfish perspective, I’m personally a bit more scared of surviving into a dystopia powered by ASI that is aligned in some narrow technical sense. Less sure from an altruistic/impartial perspective, but it seems at least plausible that building an aligned AI without making sure that the future human-AI civilization is “safe” is a not good thing to do.
I think this grounds out into object level disagreements about how we expect the future to go, probably. I think s-risks are extremely unlikely at the moment, and when I look at how best to avoid them, most such timelines don’t go through “figure out something like metaphilosophy”, but more likely through “just apply bog standard decent humanist deontological values and it’s good enough.” A lot of the s-risk in my view comes from the penchant for maximizing “good” that utilitarianism tends to promote, if we instead aim for “good enough” (which is what most people tend to instinctively favor), that cuts off most of the s-risk (though not all).
To get to the really good timelines, that route through “solve metaphilosophy”, there are mandatory previous nodes such as “don’t go extinct in 5 years.” Buying ourselves more time is powerful optionality, not just for concrete technical work, but also for improving philosophy, human epistemology/rationality, etc.
I don’t think I see a short path to communicating the parts of my model that would be most persuasive to you here (if you’re up for a call or irl discussion sometime lmk), but in short I think of policy, coordination, civilizational epistemology, institution building and metaphilosophy as closely linked and tractable problems, if only it wasn’t the case that there was a small handful of AI labs (largely supported/initiated by EA/LW-types) that are deadset on burning the commons as fast as humanly possible. If we had a few more years/decades, I think we could actually make tangible and compounding progress on these problems.
I would say that better philosophy/arguments around questions like this is a bottleneck. One reason for my interest in metaphilosophy that I didn’t mention in the OP is that studying it seems least likely to cause harm or make things worse, compared to any other AI related topics I can work on. (I started thinking this as early as 2012.) Given how much harm people have done in the name of good, maybe we should all take “first do no harm” much more seriously?
I actually respect this reasoning. I disagree strategically, but I think this is a very morally defensible position to hold, unlike the mental acrobatics necessary to work at the x-risk factories because you want to be “in the room”.
Which also represents an opportunity...
It does! If I was you, and I wanted to push forward work like this, the first thing I would do is build a company/institution! It will both test your mettle against reality and allow you to build a compounding force.
Is it actually that weird? Do you have any stories of trying to talk about it with someone and having that backfire on you?
Yup, absolutely. If you take even a microstep outside of the EA/rat-sphere, these kind of topics quickly become utterly alien to anyone. Try explaining to a politician worried about job loss, or a middle aged housewife worried about her future pension, or a young high school dropout unable to afford housing, that actually we should be worried about whether we are doing metaphilosophy correctly to ensure that future immortal superintelligence reason correctly about acausal alien gods from math-space so they don’t cause them to torture trillions of simulated souls! This is exaggerated for comedic effect, but this is really what even relatively intro level LW philosophy by default often sounds like to many people!
As the saying goes, “Grub first, then ethics.” (though I would go further and say that people’s instinctive rejection of what I would less charitably call “galaxy brain thinking” is actually often well calibrated)
As someone that does think about a lot of the things you care about at least some of the time (and does care pretty deeply), I can speak for myself why I don’t talk about these things too much:
Epistemic problems:
Mostly, the concept of “metaphilosophy” is so hopelessly broad that you kinda reach it by definition by thinking about any problem hard enough. This isn’t a good thing, when you have a category so large it contains everything (not saying this applies to you, but it applies to many other people I have met who talked about metaphilosophy), it usually means you are confused.
Relatedly, philosophy is incredibly ungrounded and epistemologically fraught. It is extremely hard to think about these topics in ways that actually eventually cash out into something tangible, rather than nerdsniping young smart people forever (or until they run out of funding).
Further on that, it is my belief that good philosophy should make you stronger, and this means that fmpov a lot of the work that would be most impactful for making progress on metaphilosophy does not look like (academic) philosophy, and looks more like “build effective institutions and learn interactively why this is hard” and “get better at many scientific/engineering disciplines and build working epistemology to learn faster”. Humans are really, really bad at doing long chains of abstract reasoning without regular contact with reality, so in practice imo good philosophy has to have feedback loops with reality, otherwise you will get confused. I might be totally wrong, but I expect at this moment in time me building a company is going to help me deconfuse a lot of things about philosophy more than me thinking about it really hard in isolation would.
It is not clear to me that there even is an actual problem to solve here. Similar to e.g. consciousness, it’s not clear to me that people who use the word “metaphilosophy” are actually pointing to anything coherent in the territory at all, or even if they are, that it is a unique thing. It seems plausible that there is no such thing as “correct” metaphilosophy, and humans are just making up random stuff based on our priors and environment and that’s it and there is no “right way” to do philosophy, similar to how there are no “right preferences”. I know the other view ofc and still worth engaging with in case there is something deep and universal to be found (the same way we found that there is actually deep equivalency and “correct” ways to think about e.g. computation).
Practical problems:
I have short timelines and think we will be dead if we don’t make very rapid progress on extremely urgent practical problems like government regulation and AI safety. Metaphilosophy falls into the unfortunate bucket of “important, but not (as) urgent” in my view.
There are no good institutions, norms, groups, funding etc to do this kind of work.
It’s weird. I happen to have a very deep interest in the topic, but it costs you weirdness points to push an idea like this when you could instead be advocating more efficiently for more pragmatic work.
It was interesting to read about your successive jumps up the meta hierarchy, because I had a similar path, but then I “jumped back down” when I realized that most of the higher levels is kinda just abstract, confusing nonsense and even really “philosophically concerned” communities like EA routinely fail basic morality such as “don’t work at organizations accelerating existential risk” and we are by no means currently bottlenecked by not having reflectively consistent theories of anthropic selection or whatever. I would like to get to a world where we have bottlenecks like that, but we are so, so far away from a world where that kind of stuff is why the world goes bad that it’s hard to justify more than some late night/weekend thought on the topic in between a more direct bottleneck focused approach.
All that being said, I still am glad some people like you exist, and if I could make your work go faster, I would love to do so. I wish I could live in a world where I could justify working with you on these problems full time, but I don’t think I can convince myself this is actually the most impactful thing I could be doing at this moment.
Yep, you see the problem! It’s tempting to just think of an AI as “just the model”, and study that in isolation, but that just won’t be good enough longterm.
Looks good to me, thank you Loppukilpailija!
Thanks!
As I have said many, many times before, Conjecture is not a deep shift in my beliefs about open sourcing, as it is not, and has never been, the position of EleutherAI (at least while I was head) that everything should be released in all scenarios, but rather that some specific things (such as LLMs of the size and strength we release) should be released in some specific situations for some specific reasons. EleutherAI would not, and has not, released models or capabilities that would push the capabilities frontier (and while I am no longer in charge, I strongly expect that legacy to continue), and there are a number of things we did discover that we decided to delay or not release at all for precisely such infohazard reasons. Conjecture of course is even stricter and has opsec that wouldn’t be possible at a volunteer driven open source community.
Additionally, Carper is not part of EleutherAI and should be considered genealogically descendant but independent of EAI.
Thanks for this! These are great questions! We have been collecting questions from the community and plan to write a follow up post addressing them in the next couple of weeks.
I initially liked this post a lot, then saw a lot of pushback in the comments, mostly of the (very valid!) form of “we actually build reliable things out of unreliable things, particularly with computers, all the time”. I think this is a fair criticism of the post (and choice of examples/metaphors therein), but I think it may be missing (one of) the core message(s) trying to be delivered.
I wanna give an interpretation/steelman of what I think John is trying to convey here (which I don’t know whether he would endorse or not):
“There are important assumptions that need to be made for the usual kind of systems security design to work (e.g. uncorrelation of failures). Some of these assumptions will (likely) not apply with AGI. Therefor, extrapolating this kind of thinking to this domain is Bad™️.” (“Epistemological vigilance is critical”)
So maybe rather than saying “trying to build robust things out of brittle things is a bad idea”, it’s more like “we can build robust things out of certain brittle things, e.g. computers, but Godzilla is not a computer, and so you should only extrapolate from computers to Godzilla if you’re really, really sure you know what you’re doing.”
I think this is something better discussed in private. Could you DM me? Thanks!
This is a genuinely difficult and interesting question that I want to provide a good answer for, but that might take me some time to write up, I’ll get back to you at a later date.
Yes, we do expect this to be the case. Unfortunately, I think explaining in detail why we think this may be infohazardous. Or at least, I am sufficiently unsure about how infohazardous it is that I would first like to think about it for longer and run it through our internal infohazard review before sharing more. Sorry!
Answered here.
Redwood is doing great research, and we are fairly aligned with their approach. In particular, we agree that hands-on experience building alignment approaches could have high impact, even if AGI ends up having an architecture unlike modern neural networks (which we don’t believe will be the case). While Conjecture and Redwood both have a strong focus on prosaic alignment with modern ML models, our research agenda has higher variance, in that we additionally focus on conceptual and meta-level research. We’re also training our own (large) models, but (we believe) Redwood are just using pretrained, publicly available models. We do this for three reasons:
Having total control over the models we use can give us more insights into the phenomena we study, such as training models at a range of sizes to study scaling properties of alignment techniques.
Some properties we want to study may only appear in close-to-SOTA models—most of which are private.
We are trying to make products, and close-to-SOTA models help us do that better. Though as we note in our post, we plan to avoid work that pushes the capabilities frontier.
We’re also for-profit, while Redwood is a nonprofit, and we’re located in London! Not everyone lives out in the Bay :)
I don’t understand what point you are trying to make, to be honest. There are certain problems that humans/I care about that we/I want NNs to solve, and some optimizers (e.g. Adam) solve those problems better or more tractably than others (e.g. SGD or second order methods). You can claim that the “set of problems humans care about” is “arbitrary”, to which I would reply “sure?”
Similarly, I want “good” “philosophy” to be “better” at “solving” “problems I care about.” If you want to use other words for this, my answer is again “sure?” I think this is a good use of the word “philosophy” that gets better at what people actually want out of it, but I’m not gonna die on this hill because of an abstract semantic disagreement.