CEO at Conjecture.
I don’t know how to save the world, but dammit I’m gonna try.
CEO at Conjecture.
I don’t know how to save the world, but dammit I’m gonna try.
This is a great intuition pump, thanks! It makes me appreciate just how, in a sense, weird it is that abstractions work at all. It seems like the universe could just not be constructed this way (though one could then argue that probably intelligence couldn’t exist in such chaotic universes, which is in itself interesting). This makes me wonder if there is a set of “natural abstractions” that are a property of the universe itself, not of whatever learning algorithm is used to pick up on them. Seems highly relevant to value learning and the like.
I am so excited about this research, good luck! I think it’s almost impossible this won’t turn up at least some interesting partial results, even if the strong versions of the hypothesis don’t work out (my guess would be you run into some kind of incomputability or incoherence results in finding an algorithm that works for every environment).
This is one of the research directions that make me the most optimistic that alignment might really be tractable!
This was an excellent post, thanks for writing it!
But, I think you unfairly dismiss the obvious solution to this madness, and I completely understand why, because it’s not at all intuitive where the problem in the setup of infinite ethics is. It’s in your choice of proof system and interpretation of mathematics! (Don’t use non-constructive proof systems!)
This is a bit of an esoteric point and I’ve been planning to write a post or even sequence about this for a while, so I won’t be able to lay out the full arguments in one comment, but let me try to convey the gist (apologies to any mathematicians reading this and spotting stupid mistakes I made):
Joe, I don’t like funky science or funky decision theory. And fair enough. But like a good Bayesian, you’ve got non-zero credence on them both (otherwise, you rule out ever getting evidence for them), and especially on the funky science one. And as I’ll discuss below, non-zero credence is enough.
This is where things go wrong. The actual credence of seeing a hypercomputer is zero, because a computationally bounded observer can never observe such an object in such a way that differentiates it from a finite approximation. As such, you should indeed have a zero percent probability of ever moving into a state in which you have performed such a verification, it is a logical impossibility. Think about what it would mean for you, a computationally bounded approximate bayesian, to come into a state of belief that you are in possession of a hypercomputer (and not a finite approximation of a hypercomputer, which is just a normal computer. Remember arbitrarily large numbers are still infinitely far away from infinity!). What evidence would you have to observe for this belief? You would need to observe literally infinite bits, and your credence to observing infinite bits should be zero, because you are computationally bounded! If you yourself are not a hypercomputer, you can never move into the state of believing a hypercomputer exists.
This is somewhat analogous to how Solomonoff inductors cannot model a universe containing themselves. Solomonoff inductors are “one step up in the halting hierarchy” from us and cannot model universes that have “super-infinite objects” like themselves in it. Similarly, we cannot model universes that contain “merely infinite” objects (and by transitivity, any super-infinite objects either) in it, either, our bayesian reasoning does not allow it!
I think the core of the problem is that, unfortunately, modern mathematics implicitly accepts classical logic as its basis of formalization, which is a problem because the Law of Excluded Middle is an implicit halting oracle. The LEM says that every logical statement is either true or false. This makes intuitive sense, but is wrong. If you think of logical statements as programs whose truth value we want to evaluate by executing a proof search, there are, in fact three “truth values”: True, false and uncomputable! This is a necessity because any axiom system worth its salt is Turing complete (this is basically what Gödel showed in his incompleteness theorems, he used Gödel numbers because Turing machines didn’t exist yet to formalize the same idea) and therefor has programs that don’t halt. Intuitionistic Logic (the logic we tend to formalize type theory and computer science with) doesn’t have this problem of an implicit halting oracle, and in my humble opinion should be used for the formalization of mathematics, on peril of trading infinite universes for an avocado sandwich and a big lizard if we use classical logic.
My own take, though, is that resting the viability of your ethics on something like “infinities aren’t a thing” is a dicey game indeed, especially given that modern cosmology says that our actual concrete universe is very plausibly infinite
Note that us using constructivist/intuitionistic logic does not mean that “infinities aren’t a thing”, it’s a bit more subtle than that (and something I have admittedly not fully deconfused for myself yet). But basically, the kind of “infinities” that cosmologists talk about are (in my ontology) very different from the “super-infinities” that you get in the limit of hypercomputation. Intuitively, it’s important to differentiate “inductive infinities” (“you need arbitrarily many steps to complete this computation”) and “real infinities” (“the solution only pops out after infinity steps have been complete” i.e. a halting oracle).
The difference makes the most sense from the perspective of computational complexity theory. The universe is a “program” of complexity class PTIME/BQP (BQP is basically just the quantum version of PTIME), which means that you can evaluate the “next state” of the universe with at most PTIME/BQP computation. Importantly, this means that even if the universe is inflationary and “infinite”, you could evaluate the state of any part of it in (arbitrarily large) finite time. There are no “effects that emerge only at infinity”. The (evaluation of a given arbitrary state of the) universe halts. This is very different to the kinds of computations a hypercomputer is capable of (and less paradoxical). Which is why I found the following very amusing:
baby-universes/wormholes/hyper-computers etc appear much more credible, at least, than “consciousness = cheesy-bread.”
Quite the opposite! Or rather, one of those three things is not like the other. baby-universes are in P/BQP, wormholes are in PSPACE (assuming by wormholes you mean closed timelike curves, which is afaik the default interpretation), and hyper-computers are halting-complete which is ludicrously insanely not even remotely like the other two things. So in that regard, yes, I think consciousness being equal to cheesy-bread is more likely than finding a hypercomputer!
To be clear when I talk about “non-constructive logic is Bad™” I don’t mean that the actual literal symbolic mathematics is somehow malign (of course), it’s the interpretation we assign to it. We think we’re reasoning about infinite objects, but we’re really reasoning about computable weaker versions of the objects, and these are not the same thing. If one is maximally careful with ones interpretations, this is (theoretically) not a problem, but this is such a subtle difference of interpretation that this is very difficult to disentangle in our mere human minds. I think this is at the heart of the problems with infinite ethics, because understanding what the correct mathematical interpretations are is so damn subtle and confusing, we find ourselves in bizarre scenarios that seem contradictory and insane because we accidentally naively extrapolate interpretations to objects they don’t belong to.
I didn’t do the best of jobs formally arguing for my point, and I’m honestly still 20% confused about this all (at least), but I hope I at least gave some interesting intuitions about why the problem might be in our philosophy of mathematics, not our philosophy of ethics.
P.S. I’m sure you’ve heard of it before, but on the off chance you haven’t, I can not recommend this wonderful paper by Scott Aaronson highly enough for a crash course in many of these kinds of topics relevant to philosophers.
I really liked this post, though I somewhat disagree with some of the conclusions. I think that in fact aligning an artificial digital intelligence will be much, much easier than working on aligning humans. To point towards why I believe this, think about how many “tech” companies (Uber, crypto, etc) derive their value, primarily, from circumventing regulation (read: unfriendly egregore rent seeking). By “wiping the slate clean” you can suddenly accomplish much more than working in a field where the enemy already controls the terrain.
If you try to tackle “human alignment”, you will be faced with the coordinated resistance of all the unfriendly demons that human memetic evolution has to offer. If you start from scratch with a new kind of intelligence, a system that doesn’t have to adhere to the existing hostile terrain (doesn’t have to have the same memetic weaknesses as humans that are so optimized against, doesn’t have to go to school, grow up in a toxic media environment etc etc), you can, maybe, just maybe, build something that circumvents this problem entirely.
That’s my biggest hope with alignment (which I am, unfortunately, not very optimistic about, but I am even more pessimistic about anything involving humans coordinating at scale), that instead of trying to pull really hard on the rope against the pantheon of unfriendly demons that run our society, we can pull the rope sideways, hard.
Of course, that “sideways” might land us in a pile of paperclips, if we don’t solve some very hard technical problems....
TL;DR: For the record, EleutherAI never actually had a policy of always releasing everything to begin with and has always tried to consider each publication’s pros vs cons. But this is still a bit of change from EleutherAI, mostly because we think it’s good to be more intentional about what should or should not be published, even if one does end up publishing many things. EleutherAI is unaffected and will continue working open source. Conjecture will not be publishing ML models by default, but may do so on a case by case basis.
Longer version:
First of all, Conjecture and EleutherAI are separate entities. The policies of one do not affect the other. EleutherAI will continue as it has.
To explain a bit of what motivated this policy: We ran into some difficulties when handling infohazards at EleutherAI. By the very nature of a public open source community, infohazard handling is tricky to say the least. I’d like to say on the record that I think EAI actually did an astoundingly good job not pushing every cool research or project discovery we encountered, for what it is. However, there are still obvious limitations to how well you can contain information spread in an environment that open.
I think the goal of a good infohazard policy should not be to make it as hard as possible to publish information or talk to people about your ideas to limit the possibility of secrets leaking, but rather to make any spreading of information more intentional. You can’t undo the spreading of information, it’s a one-way street. As such, the “by-default” component is what I think is important to allow actual control over what gets out and what not. By having good norms around not immediately sharing everything you’re working on or thinking about widely, you have more time to deliberate and consider if keeping it private is the best course of action. And if not, then you can still publish.
That’s the direction we’re taking things with Conjecture. Concretely, we are working on writing a well thought out infohazard policy internally, and plan to get the feedback of alignment researchers outside of Conjecture on whether each piece of work should or should not be published.
We have the same plan with respect to our models, which we by default will not release. However, we may choose to do so on a case by case basis and with feedback from external alignment researchers. While this is different from EleutherAI, I’d note that EAI does not, and has never, advocated for literally publishing anything and everything all the time as fast as possible. EAI is a very decentralized organization, and many people associated with the name work on pretty different projects, but in general the projects EAI chooses to do are informed by what we considered net good to be working on publicly (e.g. EAI would not release a SOTA-surpassing, or unprecedentedly large model). This is a nuanced point about EAI policy that tends to get lost in outside communication.
We recognize that Conjecture’s line of work is infohazardous. We think it’s almost guaranteed that when working on serious prosaic alignment you will stumble across capabilities increasing ideas (one could argue one of the main constraints on many current models’ usefulness/power is precisely their lack of alignment, so incremental progress could easily remove bottlenecks), and we want to have the capacity to handle these kinds of situations as gracefully as possible.
Thanks for your question and giving us the chance to explain!
We aren’t committed to any specific product or direction just yet (we think there are many low hanging fruit that we could decide to pursue). Luckily we have the independence to be able to initially spend a significant amount of time focusing on foundational infrastructure and research. Our product(s) could end up as some kind of API with useful models, interpretability tools or services, some kind of end-to-end SaaS product or something else entirely. We don’t intend to push the capabilities frontier, and don’t think this would be necessary to be profitable.
Thanks—we plan to visit the Bay soon with the team, we’ll send you a message!
Currently, there is only one board position, which I hold. I also have triple vote as insurance if we decide to expand the board. We don’t plan to give up board control.
We strongly encourage in person work—we find it beneficial to be able to talk over or debate research proposals in person at any time, it’s great for the technical team to be able to pair program or rubber duck if they’re hitting a wall, and all being located in the same city has a big impact on team building.
That being said, we don’t mandate it. Some current staff want to spend a few months a year with their families abroad, and others aren’t able to move to London at all. While we preferentially accept applicants who can work in person, we’re flexible, and if you’re interested but can’t make it to London, it’s definitely still worth reaching out.
Answered here.
EAI has always been a community-driven organization that people tend to contribute to in their spare time, around their jobs. I for example have had a dayjob of one sort or another for most of EAI’s existence. So from this angle, nothing has changed aside from the fact my job is more demanding now.
Sid and I still contribute to EAI on the meta level (moderation, organization, deciding on projects to pursue), but do admittedly have less time to dedicate to it these days. Thankfully, Eleuther is not just us—we have a bunch of projects going on at any one time, and progress for EAI doesn’t seem to be slowing down.
We are still open to the idea of releasing larger models with EAI, and funding may happen, but it’s no longer our priority to pursue that, and the technical lead of that project (Sid) has much less time to dedicate to it.
Conjecture staff will occasionally contribute to EAI projects, when we think it’s appropriate.
See a longer answer here.
TL;DR: For the record, EleutherAI never actually had a policy of always releasing everything to begin with and has always tried to consider each publication’s pros vs cons. But this is still a bit of change from EleutherAI, mostly because we think it’s good to be more intentional about what should or should not be published, even if one does end up publishing many things. EleutherAI is unaffected and will continue working open source. Conjecture will not be publishing ML models by default, but may do so on a case by case basis.
Our decision to open-source and release the weights of large language models was not a haphazard one, but was something we thought very carefully about. You can read my short post here on our reasoning behind releasing some of our models. The short version is that we think that the danger of large language models comes from the knowledge that they’re possible, and that scaling laws are true. We think that by giving researchers access to the weights of LLMs, we will aid interpretability and alignment research more than we will negatively impact timelines. At Conjecture, we aren’t against publishing, but by making non-disclosure the default, we force ourselves to consider the long-term impact of each piece of research and have a better ability to decide not to publicize something rather than having to do retroactive damage control.
We currently have a (temporary) office in the Southwark area, and are open to visitors. We’ll be moving to a larger office soon, and we hope to become a hub for AGI Safety in Europe.
And yes! Most of our staff will be attending EAG London. See you there?
See the reply to Michaël for answers as to what kind of products we will develop (TLDR we don’t know yet).
As for the conceptual research side, we do not do conceptual research with product in mind, but we expect useful corollaries to fall out by themselves for sufficiently good research. We think the best way of doing fundamental research like this is to just follow the most interesting, useful looking directions guided by the “research taste” of good researchers (with regular feedback from the rest of the team, of course). I for one at least genuinely expect product to be “easy”, in the sense that AI is advancing absurdly fast and the economic opportunities are falling from the sky like candy, so I don’t expect us to need to frantically dedicate our research to finding worthwhile fruit to pick.
The incubator has absolutely nothing to do with our for profit work, and is truly meant to be a useful space for independent researchers to develop their own directions that will hopefully be maximally beneficial to the alignment community. We will not put any requirements or restrictions on what the independent researchers work on, as long as it is useful and interesting to the alignment community.
We (the founders) have a distinct enough research agenda to most existing groups such that simply joining them would mean incurring some compromises on that front. Also, joining existing research orgs is tough! Especially if we want to continue along our own lines of research, and have significant influence on their direction. We can’t just walk in and say “here are our new frames for GPT, can we have a team to work on this asap?”.
You’re right that SOTA models are hard to develop, but that being said, developing our own models is independently useful in many ways—it enables us to maintain controlled conditions for experiments, and study things like scaling properties of alignment techniques, or how models change throughout training, as well as being useful for any future products. We have a lot of experience in LLM development and training from EleutherAI, and expect it not to take up an inordinate amount of developer hours.
We are all in favor of high bandwidth communication between orgs. We would love to work in any way we can to set these channels up with the other organizations, and are already working on reaching out to many people and orgs in the field (meet us at EAG if you can!).
In general, all the safety orgs that we have spoken with are interested in this, and that’s why we expect/hope this kind of initiative to be possible soon.
To address the opening quote—the copy on our website is overzealous, and we will be changing it shortly. We are an AGI company in the sense that we take AGI seriously, but it is not our goal to accelerate progress towards it. Thanks for highlighting that.
We don’t have a concrete proposal for how to reliably signal that we’re committed to avoiding AGI race dynamics beyond the obvious right now. There is unfortunately no obvious or easy mechanism that we are aware of to accomplish this, but we are certainly open to discussion with any interested parties about how best to do so. Conversations like this are one approach, and we also hope that our alignment research speaks for itself in terms of our commitment to AI safety.
If anyone has any more trust-inducing methods than us simply making a public statement and reliably acting consistently with our stated values (where observable), we’d love to hear about them!
To respond to the last question—Conjecture has been “in the making” for close to a year now and has not been a secret, we have discussed it in various iterations with many alignment researchers, EAs and funding orgs. A lot of initial reactions were quite positive, in particular towards our mechanistic interpretability work, and just general excitement for more people working on alignment. There have of course been concerns around organizational alignment, for-profit status, our research directions and the founders’ history with EleutherAI, which we all have tried our best to address.
But ultimately, we think whether or not the community approves of a project is a useful signal for whether a project is a good idea, but not the whole story. We have our own idiosyncratic inside-views that make us think that our research directions are undervalued, so of course, from our perspective, other people will be less excited than they should be for what we intend to work on. We think more approaches and bets are necessary, so if we would only work on the most consensus-choice projects we wouldn’t be doing anything new or undervalued. That being said, we don’t think any of the directions or approaches we’re tackling have been considered particularly bad or dangerous by large or prominent parts of the community, which is a signal we would take seriously.
The founders have a supermajority of voting shares and full board control and intend to hold on to both for as long as possible (preferably indefinitely). We have been very upfront with our investors that we do not want to ever give up control of the company (even if it were hypothetically to go public, which is not something we are currently planning to do), and will act accordingly.
For the second part, see the answer here.
To point 1: While we greatly appreciate what OpenPhil, LTFF and others do (and hope to work with them in the future!), we found that the hurdles required and strings attached were far greater than the laissez-faire silicon valley VC we encountered, and seemed less scalable in the long run. Also, FTX FF did not exist back when we were starting out.
While EA funds as they currently exist are great at handing out small to medium sized grants, the ~8 digit investment we were looking for to get started asap was not something that these kinds of orgs were generally interested in giving out (which seems to be changing lately!), especially to slightly unusual research directions and unproven teams. If our timelines were longer and the VC money had more strings attached (as some of us had expected before seeing it for ourselves!), we may well have gone another route. But the truth of the current state of the market is that if you want to scale to a billion dollars as fast as possible with the most founder control, this is the path we think is most likely to succeed.
To point 2: This is why we will focus on SaaS products on top of our internal APIs that can be built by teams that are largely independent from the ML engineering. As such, this will not compete much with our alignment-relevant ML work. This is basically our thesis as a startup: We expect it to be EV+, as this earns much more money than we would have had otherwise.
Notice this is a contingent truth, not an absolute one. If tomorrow, OpenPhil and FTX contracted us with 200M/year to do alignment work, this would of course change our strategy.
To point 3: We don’t think this has to be true. (Un)fortunately, given the current pace of capability progress, we expect keeping up with the pace to be more than enough for building new products. Competition on AI capabilities is extremely steep and not in our interest. Instead, we believe that (even) the (current) capabilities are so crazy that there is an unlimited potential for products, and we plan to compete instead on building a reliable pipeline to build and test new product ideas.
Calling it competition is actually a misnomer from our point of view. We believe there is ample space for many more companies to follow this strategy, still not have to compete, and turn a massive profit. This is how crazy capabilities and their progress are.
Great writeup, thanks!
To add to whether or not kludge and heuristics are part of the theory, I’ve asked the Numenta people in a few AMAs about their work, and they’ve made clear they are working solely on the neocortex (and the thalamus), but the neocortex isn’t the only thing in the brain. It seems clear that the kludge we know from the brain is still present, just maybe not in the neocortex. Limbic or other areas could implement kludge style shortcuts which could bias what the more uniform neocortex learns or outputs. Given my current state of knowledge of neuroscience, the most likely interpretation of this kind of research is that the neocortex is a kind of large unsupervised world model that is connected to all kinds of other hardcoded, RL or other systems, which all in concert produce human behavior. It might be similar to Schmidhuber’s RNNAI idea, where a RL agent learns to use an unsupervised “blob” of compute to achieve its goals. Something like this is probably happening in the brain since, at least as far as Numenta’s theories go, there is no reinforcement learning going on in the neocortex, which seems to contradict how humans work overall.