Independent alignment researcher
Garrett Baker
Does the possibility of China or Russia being able to steal advanced AI from labs increase or decrease the chances of great power conflict?
An argument against: It counter-intuitively decreases the chances. Why? For the same reason that a functioning US ICBM defense system would be a destabilizing influence on the MAD equilibrium. In the ICBM defense circumstance, after the shield is put up there would be no credible threat of retaliation America’s enemies would have if the US were to launch a first-strike. Therefore, there would be no reason (geopolitically) for America to launch a first-strike, and there would be quite the reason to launch a first strike: namely, the shield definitely works for the present crop of ICBMs, but may not work for future ICBMs. Therefore America’s enemies will assume that after the shield is put up, America will launch a first strike, and will seek to gain the advantage while they still have a chance by launching a pre-emptive first-strike.
The same logic works in reverse. If Russia were building a ICBM defense shield, and would likely complete it in the year, we would feel very scared about what would happen after that shield is up.
And the same logic works for other irrecoverably large technological leaps in war. If the US is on the brink of developing highly militaristically capable AIs, China will fear what the US will do with them (imagine if the tables were turned, would you feel safe with Anthropic & OpenAI in China, and DeepMind in Russia?), so if they don’t get their own versions they’ll feel mounting pressure to secure their geopolitical objectives while they still can, or otherwise make themselves less subject to the threat of AI (would you not wish the US would sabotage the Chinese Anthropic & OpenAI by whatever means if China seemed on the brink?). The fast the development, the quicker the pressure will get, and the more sloppy & rash China’s responses will be. If its easy for China to copy our AI technology, then there’s much slower mounting pressure.
This seems contrary to how much of science works. I expect if people stopped talking publicly about what they’re working on in alignment, we’d make much less progress, and capabilities would basically run business as usual.
The sort of reasoning you use here, and that my only response to it basically amounts to “well, no I think you’re wrong. This proposal will slow down alignment too much” is why I think we need numbers to ground us.
Yeah, there are reasons for caution. I think it makes sense for those concerned or non-concerned to make numerical forecasts about the costs & benefits of such questions, rather than the current state of everyone just comparing their vibes against each other. This generalizes to other questions, like the benefits of interpretability, advances in safety fine-tuning, deep learning science, and agent foundations.
Obviously such numbers aren’t the end-of-the-line, and like in biorisk, sometimes they themselves should be kept secret. But it still seems a great advance.
If anyone would like to collaborate on such a project, my DMs are open (not so say this topic is covered, this isn’t exactly my main wheelhouse).
I don’t really know what people mean when they try to compare “capabilities advancements” to “safety advancements”. In one sense, its pretty clear. The common units are “amount of time”, so we should compare the marginal (probablistic) difference between time-to-alignment and time-to-doom. But I think practically people just look at vibes.
For example, if someone releases a new open source model people say that’s a capabilities advance, and should not have been done. Yet I think there’s a pretty good case that more well-trained open source models are better for time-to-alignment than for time-to-doom, since much alignment work ends up being done with them, and the marginal capabilities advance here is zero. Such work builds on the public state of the art, but not the private state of the art, which is probably far more advanced.
I also don’t often see people making estimates of the time-wise differential impacts here. Maybe people think such things would be exfo/info-hazardous, but nobody even claims to have estimates here when the topic comes up (even in private, though people are glad to talk about their hunches for what AI will look like in 5 years, or the types of advancements necessary for AGI), despite all the work on timelines. Its difficult to do this for the marginal advance, but not so much for larger research priorities, which are the sorts of things people should be focusing on anyway.
There’s also the problem of: what do you mean by “the human”? If you make an empowerment calculus that works for humans who are atomic & ideal agents, it probably breaks once you get a superintelligence who can likely mind-hack you into yourself valuing only power. It never forces you to abstain from giving up power, since if you’re perfectly capable of making different decisions, but you just don’t.
Another problem, which I like to think of as the “control panel of the universe” problem, is where the AI gives you the “control panel of the universe”, but you aren’t smart enough to operate it, in the sense that you have the information necessary to operate it, but not the intelligence. Such that you can technically do anything you want—you have maximal power/empowerment—but the super-majority of buttons and button combinations you are likely to push result in increasing the number of paperclips.
I agree people shouldn’t use the word cruxy. But I think they should instead just directly say whether a consideration is a crux for them. I.e. whether a proposition, if false, would change their mind.
Edit: Given the confusion, what I mean is often people use “cruxy” in a more informal sense than “crux”, and label statements that are similar to statements that would be a crux but are not themselves a crux “cruxy”. I claim here people should stick to the strict meaning.
I agree that contrarians ’round these parts are often wrong more often than academic consensus, but the success of their predictions about AI, crypto, and COVID prove to me its still worth listening to them, trying to be able to think like them, and probably taking their investment advice. That is, when they’re right, they’re right big-time.
[edit: nevermind I see you already know about the following quotes. There’s other evidence of the influence in Sedley’s book I link below]
In De Reum Natura around line 716:
Add, too, whoever make the primal stuff Twofold, by joining air to fire, and earth To water; add who deem that things can grow Out of the four- fire, earth, and breath, and rain; As first Empedocles of Acragas, Whom that three-cornered isle of all the lands Bore on her coasts, around which flows and flows In mighty bend and bay the Ionic seas, Splashing the brine from off their gray-green waves. Here, billowing onward through the narrow straits, Swift ocean cuts her boundaries from the shores Of the Italic mainland. Here the waste Charybdis; and here Aetna rumbles threats To gather anew such furies of its flames As with its force anew to vomit fires, Belched from its throat, and skyward bear anew Its lightnings’ flash. And though for much she seem The mighty and the wondrous isle to men, Most rich in all good things, and fortified With generous strength of heroes, she hath ne’er Possessed within her aught of more renown, Nor aught more holy, wonderful, and dear Than this true man. Nay, ever so far and pure The lofty music of his breast divine Lifts up its voice and tells of glories found, That scarce he seems of human stock create.
Or for a more modern translation from Sedley’s Lucretius and the Transformation of Greek Wisdom
Of these [sc. the four-element theorists] the foremost is
Empedocles of Acragas, born within the three-cornered terres-
trial coasts of the island [Sicily] around which the Ionian Sea,
flowing with its great windings, sprays the brine from its green
waves, and from whose boundaries the rushing sea with its
narrow strait divides the coasts of the Aeolian land with its
waves. Here is destructive Charybdis, and here the rumblings of
Etna give warning that they are once more gathering the wrath
of their flames so that her violence may again spew out the fire
flung from her jaws and hurl once more to the sky the lightning
flashes of flame. Although this great region seems in many ways
worthy of admiration by the human races, and is said to deserve
visiting for its wealth of good things and the great stock of men
that fortify it, yet it appears to have had in it nothing more
illustrious than this man, nor more holy, admirable, and pre-
cious. What is more, the poems sprung from his godlike mind
call out and expound his illustrious discoveries, so that he
scarcely seems to be born of mortal stock.
I find this very hard to believe. Shouldn’t Chinese merchants have figured out eventually, traveling long distances using maps, that the Earth was a sphere? I wonder whether the “scholars” of ancient China actually represented the state-of-the-art practical knowledge that the Chinese had.
Nevertheless, I don’t think this is all that counterfactual. If you’re obsessed with measuring everything, and like to travel (like the Greeks), I think eventually you’ll have to discover this fact.
I’ve heard an argument that Mendel was actually counter-productive to the development of genetics. That if you go and actually study peas like he did, you’ll find they don’t make perfect Punnett squares, and from the deviations you can derive recombination effects. The claim is he fudged his data a little in order to make it nicer, then this held back others from figuring out the topological structure of genotypes.
A precursor to Lucretius’s thoughts on natural selection is Empedocles, who we have far fewer surviving writings from, but which is clearly a precursor to Lucretius’ position. Lucretius himself cites & praises Empedocles on this subject.
Possibly Wantanabe’s singular learning theory. The math is recent for math, but I think only like ’70s recent, which is long given you’re impressed by a 20-year math gap for Einstein. The first book was published in 2010, and the second in 2019, so possibly attributable to the deep learning revolution, but I don’t know of anyone making the same math—except empirical stuff like the “neuron theory” of neural network learning which I was told about by you, empirical results like those here, and high-dimensional probability (which I haven’t read, but whose cover alone indicates similar content).
Many who believe in God derive meaning, despite God theoretically being able to do anything they can do but better, from the fact that He chose not to do the tasks they are good at, and left them tasks to try to accomplish. Its common for such people to believe that this meaning would disappear if God disappeared, but whenever such a person does come to no longer believe in God, they often continue to see meaning in their life[1].
Now atheists worry about building God because it may destroy all meaning to our actions. I expect we’ll adapt.
(edit: That is to say, I don’t think you’ve adequately described what “meaning of life” is if you’re worried about it going away in the situation you describe)
- ↩︎
If anything, they’re more right than wrong, there has been much written about the “meaning crisis” we’re in, possibly attributable to greater levels of atheism.
- ↩︎
Post the chat logs?
Priors are not things you can arbitrarily choose, and then throw you hands up and say “oh well, I guess I just have stuck priors, and that’s why I look at the data, and conclude neoliberal-libertarian economics is mostly correct, and socialist economics is mostly wrong” to the extent you say this, you are not actually looking at any data, you are just making up an answer that sounds good, and then when you encounter conflicting evidence, you’re stating you won’t change your mind because of a flaw in your reasoning (stuck priors), and that’s ok, because you have a flaw in your reasoning (stuck priors). Its a circular argument!
If this is what you actually believe, you shouldn’t be making donations to either charter cities projects or developing unions projects[1]. Because what you actually believe is that the evidence you’ve seen is likely under both worldviews, and if you were “using” a non-gerrymandered prior or reasoning without your bottom-line already written, you’d have little reason to prefer one over the other.
Both of the alternatives you’ve presented are fools who in the back of their minds know they’re fools, but care more about having emotionally satisfying worldviews instead of correct worldviews. To their credit, they have successfully double-thought their way to reasonable donation choices which would otherwise have destroyed their worldview. But they could do much better by no longer being fools.
- ↩︎
Alternatively, if you justify your donation anyway in terms of its exploration value, you should be making donations to both.
- ↩︎
I wonder if everyone excited is just engaging by filling out the form rather than publicly commenting.
There is evidence that transformers are not in fact even implicitly, internally, optimized for reducing global prediction error (except insofar as comp-mech says they must in order to do well on the task they are optimized for).
Do transformers “think ahead” during inference at a given position? It is known transformers prepare information in the hidden states of the forward pass at t that is then used in future forward passes t+τ. We posit two explanations for this phenomenon: pre-caching, in which off-diagonal gradient terms present in training result in the model computing features at t irrelevant to the present inference task but useful for the future, and breadcrumbs, in which features most relevant to time step t are already the same as those that would most benefit inference at time t+τ. We test these hypotheses by training language models without propagating gradients to past timesteps, a scheme we formalize as myopic training. In a synthetic data setting, we find clear evidence for pre-caching. In the autoregressive language modeling setting, our experiments are more suggestive of the breadcrumbs hypothesis.
Hi John,
thank you for sharing the job postings. We’re starting something really exciting, and as research leads on the team, we—Paul Lessard and Bruno Gavranović - thought wed provide clarifications.
Symbolica was not started to improve ML using category theory. Instead, Symbolica was founded ~2 years ago, with its 2M seed funding round aimed at tackling the problem of symbolic reasoning, but at the time, its path to getting there wasn’t via categorical deep learning (CDL). The original plan was to use hypergraph rewriting as means of doing learning more efficiently. That approach however was eventually shown unviable.
Symbolica’s pivot to CDL started about five months ago. Bruno had just finished his Ph.D. thesis laying the foundations for the topic and we reoriented much of the organization towards this research direction. In particular, we began: a) refining a roadmap to develop and apply CDL, and b) writing a position paper, in collaboration with with researchers at Google DeepMind which you’ve cited below.
Over these last few months, it has become clear that our hunches about applicability are actually exciting and viable research directions. We’ve made fantastic progress, even doing some of the research we planned to advocate for in the aforementioned position paper. Really, we discovered just how much Taking Categories Seriously gives you in the field of Deep Learning.
Many advances in DL are about creating models which identify robust and general patterns in data (see the Transformers/Attention mechanism, for instance). In many ways this is exactly what CT is about: it is an indispensable tool for many scientists, including ourselves, to understand the world around us: to find robust patterns in data, but also to communicate, verify, and explain our reasoning.
At the same time, the research engineering team of Symbolica has made significant, independent, and concrete progress implementing a particular deep learning model that operates on text data, but not in an autoregressive manner as most GPT-style models do.
These developments were key signals to Vinod and other investors, leading to the closing of the 31M funding round.
We are now developing a research programme merging the two, leveraging insights from theories of structure, e.g. categorical algebra, as means of formalising the process by which we find structure in data. This has twofold consequence: pushing models to identify more robust patterns in data, but also interpretable and verifiable ones.
In summary:
a) The push to apply category theory was not based on a singular whim, as the the post might suggest,
but that instead
b) Symbolica is developing a serious research programme devoted to applying category theory to deep learning, not merely hiring category theorists
All of this is to add extra context for evaluating the company, its team, and our direction, which does not come across in the recently published tech articles.
We strongly encourage interested parties to look at all of the job ads, which we’ve tailored to particular roles. Roughly, in the CDL team, we’re looking for either
1) expertise in category theory, and a strong interest in deep learning, or
2) expertise in deep learning, and a strong interest in category theory.
at all levels of seniority.
Happy to answer any other questions/thoughts.
Bruno Gavranović,
Paul Lessard
From The Guns of August
Old Field Marshal Moltke in 1890 foretold that the next war might last seven years—or thirty—because the resources of a modern state were so great it would not know itself to be beaten after a single military defeat and would not give up [...] It went against human nature, however—and the nature of General Staffs—to follow through the logic of his own prophecy. Amorphous and without limits, the concept of a long war could not be scientifically planned for as could the orthodox, predictable, and simple solution of decisive battle and a short war. The younger Moltke was already Chief of Staff when he made his prophecy, but neither he nor his Staff, nor the Staff of any other country, ever made any effort to plan for a long war. Besides the two Moltkes, one dead and the other infirm of purpose, some military strategists in other countries glimpsed the possibility of prolonged war, but all preferred to believe, along with the bankers and industrialists, that because of the dislocation of economic life a general European war could not last longer than three or four months. One constant among the elements of 1914—as of any era—was the disposition of everyone on all sides not to prepare for the harder alternative, not to act upon what they suspected to be true.
Archived website