I have a compute-market startup called vast.ai, and I’m working towards aligned AI. Currently seeking networking, collaborators, and hires—especially top notch cuda/gpu programmers.
My personal blog: https://entersingularity.wordpress.com/
I have a compute-market startup called vast.ai, and I’m working towards aligned AI. Currently seeking networking, collaborators, and hires—especially top notch cuda/gpu programmers.
My personal blog: https://entersingularity.wordpress.com/
I support this and will match the $250 prize.
Here are the central background ideas/claims:
1.) Computers are built out of components which are also just simpler computers, which bottoms out at the limits of miniaturization in minimal molecular sized (few nm) computational elements (cellular automata/tiles). Further shrinkage is believed impossible in practice due to various constraints (overcoming these constraints if even possible would require very exotic far future tech).
2.) At this scale the landauer bound represents the ambient temperature dependent noise (which can also manifest as a noise voltage). Reliable computation at speed is only possible using non-trivial multiples of this base energy, for the simple reasons described by landauer and elaborated on in the other refs in my article.
3.) Components can be classified as computing tiles or interconnect tiles, but the latter is simply a computer which computes the identity but moves the input to an output in some spatial direction. Interconnect tiles can be irreversible or reversible, but the latter has enormous tradeoffs in size (ie optical) and or speed or other variables and is thus not used by brains or GPUs/CPUs.
4.) Fully reversible computers are possible in theory but have enormous negative tradeoffs in size/speed due to 1.) the need to avoid erasing bits throughout intermediate computations, 2.) the lack of immediate error correction (achieved automatically in dissipative interconnect by erasing at each cycle) leading to error build up which must be corrected/erased (costing energy), 3.) high sensitivity to noise/disturbance due to 2
And the brain vs computer claims:
5.) The brain is near the pareto frontier for practical 10W computers, and makes reasonably good tradeoffs between size, speed, heat and energy as a computational platform for intelligence
6.) Computers are approaching the same pareto frontier (although currently in a different region of design space) - shrinkage is nearing its end
Remembering and imagination share the same pathways and are difficult to distinguish at the neuro circuit level. The idea of recovered memories was already discredited decades ago after the peak of the satanic ritual abuse hysteria/panic of the 80′s. At its peak some parents were jailed based on testimonies of children, children that had been coerced (both deliberately and indirectly) into recanting fantastical, increasingly outlandish tales of satanic baby eating rituals. The FBI even eventually investigated and found 0 evidence, but the turning point was when some lawyers and psychiatrists started winning lawsuits against the psychologists and social workers at the center of the recovered memory movement.
Memories change every time they are rehearsed/reimagined; the magnitude of such change varies and can be significant, and the thin separation between imaginings (imagined memories, memories/stories of others, etc) and ‘factual’ memories doesn’t really erode so much as not really exist in the first place.
Nonetheless, some people’s detailed memories from childhood are probably largely accurate, but some detailed childhood memories are complete confabulations based on internalization of external evidence, and some are later confabulations based on attempts to remember or recall and extensive dwelling on the past, and some are complete fiction. No way with current tech to distinguish between, even for the rememberer.
I feel like even under the worldview that your beliefs imply, a superintelligence will just make a brain the size of a factory, and then be in a position to outcompete or destroy humanity quite easily.
I am genuinely curious and confused as to what exactly you concretely imagine this supposed ‘superintelligence’ to be, such that is not already the size of a factory, such that you mention “size of a factory” as if that is something actually worth mentioning—at all. Please show at least your first pass fermi estimates for the compute requirements. By that I mean—what are the compute requirements for the initial SI—and then the later presumably more powerful ‘factory’?
Maybe it will do that using GPUs, or maybe it will do that using some more neuromorphic design, but I really don’t understand why energy density matters very much.
I would suggest reading more about advanced GPU/accelerator design, and then about datacenter design and the thermodynamic/cooling considerations therein.
The vast majority of energy that current humans produce is of course not spent on running human brains, and there are easily 10-30 OOMs of improvement lying around without going into density (just using the energy output of a single power plant under your model would produce something that would likely be easily capable of disempowering humanity).
This is so wildly ridiculous that you really need to show your work. I have already shown some calculations in these threads, but I’ll quickly review here.
A quick google search indicates 1GW is a typical power plant output, which in theory could power roughly a million GPU datacenter. This is almost 100 times larger in power consumption than the current largest official supercomputer: Frontier—which has about 30k GPUs. The supercomputer used to train GPT4 is somewhat of a secret, but estimated to be about that size. So at 50x to 100x you are talking about scaling up to something approaching a hypothetical GPT-5 scale cluster.
Nvidia currently produces less than 100k high end enterprise GPUs per year in total, so you can’t even produce this datacenter unless Nvidia grows by about 10x and TSMC grows by perhaps 2x.
The datacenter would likely cost over a hundred billion dollars, and the resulting models would be proportionally more expensive to run, such that it’s unclear whether this would be a win (at least using current tech). Sure I do think there is some room for software improvement.
But no, I do not think that this hypothetical not currently achievable GPT5 - even if you were running 100k instances of it—would “likely be easily capable of disempowering humanity”.
Of course if we talk longer term, the brain is obviously evidence that one human-brain power can be achieved in about 10 watts, so the 1GW power plant could support a population of 100 million uploads or neuromorphic AGIs. That’s very much part of my model (and hansons, and moravecs) - eventually.
Remember this post is all about critiquing EY’s specific doom model which involves fast foom on current hardware through recursive self-improvement.
Having more room at the bottom is just one of a long list of ways to end up with AIs much smarter than humans. Maybe you have rebuttals to all the other ways AIs could end up much smarter than humans
If you have read much of my writings, you should know that I believe its obvious we will end up with AIs much smarter than humans—but mainly because they will run faster using much more power. In fact this prediction has already come to pass in a limited sense—GPT4 was probably trained on over 100 human lifetimes worth of virtual time/data using only about 3 months of physical time, which represents a 10000x time dilation (but thankfully only for training, not for inference).
Your section on the physical limits of hardware computation .. is naive; the dominant energy cost is now interconnect (moving bits), not logic ops. This is a complex topic and you could use more research and references from the relevant literature; there are good reasons why the semiconductor roadmap has ended and the perception in industry is that Moore’s Law is finally approaching it’s end. For more info see this, with many references.
To the connectivists such as myself, your point 0 has seemed obvious for a while, so the EY/MIRI/LW anti-neural net groupthink was/is a strong sign of faulty beliefs. And saying “oh but EY/etc didn’t really think neural nets wouldn’t work, they just thought other paradigms would be safer” doesn’t really help much if no other paradigms ever had a chance. Underlying much of the rationalist groupthink on AI safety is a set of correlated incorrect anti-connectivist beliefs which undermines much of the standard conclusions.
Given that a high stakes all out arms race for frontier foundation AGI models is heating up between the major powers, and meta’s public models are trailing—it doesn’t seem clear at all that open sourcing them is net safety negative. One could argue the benefits of having wide access for safety research along with tilting the world towards multi-polar scenarios outweight the (more minimal) risks.
The merit of this post is to taboo nanotech. Practical bottom-up nanotech is simply synthetic biology, and practical top-down nanotech is simply modern chip lithography. So:
1.) can an AI use synthetic bio as a central ingredient of a plan to wipe out humanity?
Sure.
2.) can an AI use synthetic bio or chip litho a central ingredient of a plan to operate perpetually in a world without humans?
Sure
But doesn’t sound as exciting? Good.
ANNs and BNNs operate on the same core principles; the scaling laws apply to both and IQ in either is a mostly function of net effective training compute and data quality. Genes determine a brain’s architectural prior just as a small amount of python code determines an ANN’s architectural prior, but the capabilities come only from scaling with compute and data (quantity and quality).
So you absolutely can not take datasets of gene-IQ correlations and assume those correlations would somehow transfer to gene interventions on adults (post training in DL lingo). The genetic contribution to IQ is almost all developmental/training factors (architectural prior, learning algorithm hyper params, value/attention function tweaks, etc) which snowball during training. Unfortunately developmental windows close and learning rates slow down as the brain literally carves/prunes out its structure, so to the extent this could work at all, it is mostly limited to interventions on children and younger adults who still have significant learning rate reserves.
But it ultimately doesn’t matter, because the brain just learns too slowly. We are now soon past the point at which human learning matters much.
Back when the sequences were written in 2007/2008 you could roughly partition the field of AI based on beliefs around the efficiency and tractability of the brain. Everyone in AI looked at the brain as the obvious single example of intelligence, but in very different lights.
If brain algorithms are inefficient and intractable[1] then neuroscience has little to offer, and instead more formal math/CS approaches are preferred. One could call this the rationalist approach to AI, or perhaps the “and everything else approach”. One way to end up in that attractor is by reading a bunch of ev psych; EY in 2007 was clearly heavily into Tooby and Cosmides, even if he has some quibbles with them on the source of cognitive biases.
From Evolutionary Psychology and the Emotions:
From the Psychological Foundations of Culture:
EY quotes this in LOGI, 2007 (p 4), immediately followed with:
Meanwhile in the field of neuroscience there was a growing body of evidence and momentum coalescing around exactly the “physics envy” approaches EY bemoans: the universal learning hypothesis, popularized to a wider audience in On Intelligence in 2004. It is pretty much pure tabula rosa, blank-slate, genericity and black-box.
The UL hypothesis is that the brain’s vast complexity is actually emergent, best explained by simple universal learning algorithms that automatically evolve all the complex domain specific circuits as required by the simple learning objectives and implied by the training data. (Years later I presented it on LW in 2015, and I finally got around to writing up the brain efficiency issue more recently—although I literally started the earlier version of that article back in 2012.)
But then the world did this fun experiment: the rationalist/non-connectivist AI folks got most of the attention and research money, but not all of it—and then various researcher groups did their thing and tried to best each other on various benchmarks. Eventually Nvidia released cuda, a few connectivists ported ANN code to their gaming GPUs which started to break imagenet, and then a little startup founded with the mission of reverse engineering the brain by some folks who met in a neuroscience program adapted that code to play Atari and later break Go; the rest is history—as you probably know.
Turns out the connectivists and the UL hypothesis were pretty much completely right after all—proven not only by the success of DL in AI, but also by how DL is transforming neuroscience. We know now that the human brain learns complex tasks like vision and language not through kludgy complex evolved mechanisms, but through the exact same simple approximate bayesian (self-supervised) learning algorithms that drive modern DL systems.
The sequences and associated materials were designed to “raise the rationality water line” and ultimately funnel promising new minds into AI-safety. And there they succeeded, especially in those earlier years. Finding an AI safety researcher today who isn’t familiar with the sequences and LW .. well maybe they exist? But they would be unicorns. ML-safety and even brain-safety approaches are now obviously more popular, but there is still this enormous bias/inertia in AI safety stemming from the circa 2007 beliefs and knowledge crystallized and distilled into the sequences.
It’s also possible to end up in the “brains are highly efficient, but completely intractable” camp, which implies uploading as the most likely path to AI—this is where Hanson is—and closer to my beliefs circa 2000 ish before I had studied much systems neuroscience.