jacob_cannell

Karma: 5,980

I have a compute-market startup called vast.ai, and I’m working towards aligned AI. Currently seeking networking, collaborators, and hires—especially top notch cuda/gpu programmers.

My personal blog: https://entersingularity.wordpress.com/

Brain Efficiency: Much More than You Wanted to Know

jacob_cannell6 Jan 2022 3:38 UTC

196 points

102 comments29 min readLW link

The Brain as a Universal Learning Machine

jacob_cannell24 Jun 2015 21:45 UTC

188 points

171 comments19 min readLW link

AI Timelines via Cumulative Optimization Power: Less Long, More Short

jacob_cannell6 Oct 2022 0:21 UTC

139 points

33 comments6 min readLW link

Contra Yudkowsky on AI Doom

jacob_cannell24 Apr 2023 0:20 UTC

110 points

111 comments9 min readLW link

Contra Yudkowsky on Doom from Foom #2

jacob_cannell27 Apr 2023 0:07 UTC

101 points

76 comments6 min readLW link

jacob_cannell 9 Oct 2022 14:09 UTC
77 points
13
in reply to: Raemon’s comment on: So, geez there’s a lot of AI content these days
Back when the sequences were written in 2007/2008 you could roughly partition the field of AI based on beliefs around the efficiency and tractability of the brain. Everyone in AI looked at the brain as the obvious single example of intelligence, but in very different lights.

If brain algorithms are inefficient and intractable^[1] then neuroscience has little to offer, and instead more formal math/CS approaches are preferred. One could call this the rationalist approach to AI, or perhaps the “and everything else approach”. One way to end up in that attractor is by reading a bunch of ev psych; EY in 2007 was clearly heavily into Tooby and Cosmides, even if he has some quibbles with them on the source of cognitive biases.

From Evolutionary Psychology and the Emotions:

An evolutionary perspective leads one to view the mind as a crowded zoo of evolved, domain-specific programs. Each is functionally specialized for solving a different adaptive problem that arose during hominid evolutionary history, such as face recognition, foraging, mate choice, heart rate regulation, sleep management, or predator vigilance, and each is activated by a different set of cues from the environment.

From the Psychological Foundations of Culture:

Evolution, the constructor of living organisms, has no privileged tendency to build into designs principles of operation that are simple and general. (Tooby and Cosmides 1992)

EY quotes this in LOGI, 2007 (p 4), immediately followed with:

The field of Artificial Intelligence suffers from a heavy, lingering dose of genericity and black-box, blank-slate, tabula-rasa concepts seeping in from the Standard Social Sciences Model (SSSM) identified by Tooby and Cosmides (1992). The general project of liberating AI from the clutches of the SSSM is more work than I wish to undertake in this paper, but one problem that must be dealt with immediately is physics envy. The development of physics over the last few centuries has been characterized by the discovery of unifying equations which neatly underlie many complex phenomena. Most of the past fifty years in AI might be described as the search for a similar unifying principle believed to underlie the complex phenomenon of intelligence.

Physics envy in AI is the search for a single, simple underlying process, with the expectation that this one discovery will lay bare all the secrets of intelligence.

Meanwhile in the field of neuroscience there was a growing body of evidence and momentum coalescing around exactly the “physics envy” approaches EY bemoans: the universal learning hypothesis, popularized to a wider audience in On Intelligence in 2004. It is pretty much pure tabula rosa, blank-slate, genericity and black-box.

The UL hypothesis is that the brain’s vast complexity is actually emergent, best explained by simple universal learning algorithms that automatically evolve all the complex domain specific circuits as required by the simple learning objectives and implied by the training data. (Years later I presented it on LW in 2015, and I finally got around to writing up the brain efficiency issue more recently—although I literally started the earlier version of that article back in 2012.)

But then the world did this fun experiment: the rationalist/non-connectivist AI folks got most of the attention and research money, but not all of it—and then various researcher groups did their thing and tried to best each other on various benchmarks. Eventually Nvidia released cuda, a few connectivists ported ANN code to their gaming GPUs which started to break imagenet, and then a little startup founded with the mission of reverse engineering the brain by some folks who met in a neuroscience program adapted that code to play Atari and later break Go; the rest is history—as you probably know.

Turns out the connectivists and the UL hypothesis were pretty much completely right after all—proven not only by the success of DL in AI, but also by how DL is transforming neuroscience. We know now that the human brain learns complex tasks like vision and language not through kludgy complex evolved mechanisms, but through the exact same simple approximate bayesian (self-supervised) learning algorithms that drive modern DL systems.

The sequences and associated materials were designed to “raise the rationality water line” and ultimately funnel promising new minds into AI-safety. And there they succeeded, especially in those earlier years. Finding an AI safety researcher today who isn’t familiar with the sequences and LW .. well maybe they exist? But they would be unicorns. ML-safety and even brain-safety approaches are now obviously more popular, but there is still this enormous bias/inertia in AI safety stemming from the circa 2007 beliefs and knowledge crystallized and distilled into the sequences.
1. ↩︎
  It’s also possible to end up in the “brains are highly efficient, but completely intractable” camp, which implies uploading as the most likely path to AI—this is where Hanson is—and closer to my beliefs circa 2000 ish before I had studied much systems neuroscience.
What links here?

The Unfriendly Superintelligence next door

jacob_cannell2 Jul 2015 18:46 UTC

67 points

68 comments7 min readLW link

Empowerment is (almost) All We Need

jacob_cannell23 Oct 2022 21:48 UTC

64 points

44 comments17 min readLW link

jacob_cannell 26 Apr 2023 18:07 UTC
64 points
13
on: $250 prize for checking Jake Cannell’s Brain Efficiency
I support this and will match the $250 prize.

Here are the central background ideas/claims:

1.) Computers are built out of components which are also just simpler computers, which bottoms out at the limits of miniaturization in minimal molecular sized (few nm) computational elements (cellular automata/tiles). Further shrinkage is believed impossible in practice due to various constraints (overcoming these constraints if even possible would require very exotic far future tech).

2.) At this scale the landauer bound represents the ambient temperature dependent noise (which can also manifest as a noise voltage). Reliable computation at speed is only possible using non-trivial multiples of this base energy, for the simple reasons described by landauer and elaborated on in the other refs in my article.

3.) Components can be classified as computing tiles or interconnect tiles, but the latter is simply a computer which computes the identity but moves the input to an output in some spatial direction. Interconnect tiles can be irreversible or reversible, but the latter has enormous tradeoffs in size (ie optical) and or speed or other variables and is thus not used by brains or GPUs/CPUs.

4.) Fully reversible computers are possible in theory but have enormous negative tradeoffs in size/speed due to 1.) the need to avoid erasing bits throughout intermediate computations, 2.) the lack of immediate error correction (achieved automatically in dissipative interconnect by erasing at each cycle) leading to error build up which must be corrected/erased (costing energy), 3.) high sensitivity to noise/disturbance due to 2

And the brain vs computer claims:

5.) The brain is near the pareto frontier for practical 10W computers, and makes reasonably good tradeoffs between size, speed, heat and energy as a computational platform for intelligence

6.) Computers are approaching the same pareto frontier (although currently in a different region of design space) - shrinkage is nearing its end

LOVE in a simbox is all you need

jacob_cannell28 Sep 2022 18:25 UTC

63 points

72 comments44 min readLW link 1 review

Magna Alta Doctrina

jacob_cannell11 Dec 2021 21:54 UTC

58 points

7 comments28 min readLW link

jacob_cannell 8 Oct 2023 0:02 UTC
48 points
22
in reply to: Joseph Miller’s comment on: Sam Altman’s sister, Annie Altman, says Sam has (severely) abused her
Remembering and imagination share the same pathways and are difficult to distinguish at the neuro circuit level. The idea of recovered memories was already discredited decades ago after the peak of the satanic ritual abuse hysteria/panic of the 80′s. At its peak some parents were jailed based on testimonies of children, children that had been coerced (both deliberately and indirectly) into recanting fantastical, increasingly outlandish tales of satanic baby eating rituals. The FBI even eventually investigated and found 0 evidence, but the turning point was when some lawyers and psychiatrists started winning lawsuits against the psychologists and social workers at the center of the recovered memory movement.

Memories change every time they are rehearsed/reimagined; the magnitude of such change varies and can be significant, and the thin separation between imaginings (imagined memories, memories/stories of others, etc) and ‘factual’ memories doesn’t really erode so much as not really exist in the first place.

Nonetheless, some people’s detailed memories from childhood are probably largely accurate, but some detailed childhood memories are complete confabulations based on internalization of external evidence, and some are later confabulations based on attempts to remember or recall and extensive dwelling on the past, and some are complete fiction. No way with current tech to distinguish between, even for the rememberer.

Fast Minds and Slow Computers

jacob_cannell5 Feb 2011 10:05 UTC

47 points

93 comments5 min readLW link

jacob_cannell 24 Apr 2023 5:39 UTC
42 points
31
in reply to: habryka’s comment on: Contra Yudkowsky on AI Doom

I feel like even under the worldview that your beliefs imply, a superintelligence will just make a brain the size of a factory, and then be in a position to outcompete or destroy humanity quite easily.

I am genuinely curious and confused as to what exactly you concretely imagine this supposed ‘superintelligence’ to be, such that is not already the size of a factory, such that you mention “size of a factory” as if that is something actually worth mentioning—at all. Please show at least your first pass fermi estimates for the compute requirements. By that I mean—what are the compute requirements for the initial SI—and then the later presumably more powerful ‘factory’?

Maybe it will do that using GPUs, or maybe it will do that using some more neuromorphic design, but I really don’t understand why energy density matters very much.

I would suggest reading more about advanced GPU/accelerator design, and then about datacenter design and the thermodynamic/cooling considerations therein.

The vast majority of energy that current humans produce is of course not spent on running human brains, and there are easily 10-30 OOMs of improvement lying around without going into density (just using the energy output of a single power plant under your model would produce something that would likely be easily capable of disempowering humanity).

This is so wildly ridiculous that you really need to show your work. I have already shown some calculations in these threads, but I’ll quickly review here.

A quick google search indicates 1GW is a typical power plant output, which in theory could power roughly a million GPU datacenter. This is almost 100 times larger in power consumption than the current largest official supercomputer: Frontier—which has about 30k GPUs. The supercomputer used to train GPT4 is somewhat of a secret, but estimated to be about that size. So at 50x to 100x you are talking about scaling up to something approaching a hypothetical GPT-5 scale cluster.

Nvidia currently produces less than 100k high end enterprise GPUs per year in total, so you can’t even produce this datacenter unless Nvidia grows by about 10x and TSMC grows by perhaps 2x.

The datacenter would likely cost over a hundred billion dollars, and the resulting models would be proportionally more expensive to run, such that it’s unclear whether this would be a win (at least using current tech). Sure I do think there is some room for software improvement.

But no, I do not think that this hypothetical not currently achievable GPT5 - even if you were running 100k instances of it—would “likely be easily capable of disempowering humanity”.

Of course if we talk longer term, the brain is obviously evidence that one human-brain power can be achieved in about 10 watts, so the 1GW power plant could support a population of 100 million uploads or neuromorphic AGIs. That’s very much part of my model (and hansons, and moravecs) - eventually.

Remember this post is all about critiquing EY’s specific doom model which involves fast foom on current hardware through recursive self-improvement.

Having more room at the bottom is just one of a long list of ways to end up with AIs much smarter than humans. Maybe you have rebuttals to all the other ways AIs could end up much smarter than humans

If you have read much of my writings, you should know that I believe its obvious we will end up with AIs much smarter than humans—but mainly because they will run faster using much more power. In fact this prediction has already come to pass in a limited sense—GPT4 was probably trained on over 100 human lifetimes worth of virtual time/data using only about 3 months of physical time, which represents a 10000x time dilation (but thankfully only for training, not for inference).
What links here?
- jacob_cannell's comment on Contra Yudkowsky on AI Doom by jacob_cannell (24 Apr 2023 7:56 UTC; 10 points)
- jacob_cannell's comment on Contra Yudkowsky on AI Doom by jacob_cannell (24 Apr 2023 7:47 UTC; 1 point)

jacob_cannell 28 Sep 2022 7:53 UTC
41 points
24
on: Why I think strong general AI is coming soon
Your section on the physical limits of hardware computation .. is naive; the dominant energy cost is now interconnect (moving bits), not logic ops. This is a complex topic and you could use more research and references from the relevant literature; there are good reasons why the semiconductor roadmap has ended and the perception in industry is that Moore’s Law is finally approaching it’s end. For more info see this, with many references.
What links here?
- Why I think strong general AI is coming soon by porby (28 Sep 2022 5:40 UTC; 325 points)
- Why I think strong general AI is coming soon by porby (EA Forum; 28 Sep 2022 6:55 UTC; 14 points)

jacob_cannell 4 Apr 2023 21:37 UTC
40 points
21
on: Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds
To the connectivists such as myself, your point 0 has seemed obvious for a while, so the EY/MIRI/LW anti-neural net groupthink was/is a strong sign of faulty beliefs. And saying “oh but EY/etc didn’t really think neural nets wouldn’t work, they just thought other paradigms would be safer” doesn’t really help much if no other paradigms ever had a chance. Underlying much of the rationalist groupthink on AI safety is a set of correlated incorrect anti-connectivist beliefs which undermines much of the standard conclusions.

Analogical Reasoning and Creativity

jacob_cannell1 Jul 2015 20:38 UTC

38 points

15 comments14 min readLW link

jacob_cannell 20 Sep 2023 20:48 UTC
38 points
18
on: Protest against Meta’s irreversible proliferation (Sept 29, San Francisco)
Given that a high stakes all out arms race for frontier foundation AGI models is heating up between the major powers, and meta’s public models are trailing—it doesn’t seem clear at all that open sourcing them is net safety negative. One could argue the benefits of having wide access for safety research along with tilting the world towards multi-polar scenarios outweight the (more minimal) risks.

jacob_cannell 17 Apr 2023 19:40 UTC
37 points
25
in reply to: Steven Byrnes’s comment on: grey goo is unlikely
The merit of this post is to taboo nanotech. Practical bottom-up nanotech is simply synthetic biology, and practical top-down nanotech is simply modern chip lithography. So:

1.) can an AI use synthetic bio as a central ingredient of a plan to wipe out humanity?

Sure.

2.) can an AI use synthetic bio or chip litho a central ingredient of a plan to operate perpetually in a world without humans?

Sure

But doesn’t sound as exciting? Good.

jacob_cannell 13 Dec 2023 4:54 UTC
33 points
1
on: Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
ANNs and BNNs operate on the same core principles; the scaling laws apply to both and IQ in either is a mostly function of net effective training compute and data quality. Genes determine a brain’s architectural prior just as a small amount of python code determines an ANN’s architectural prior, but the capabilities come only from scaling with compute and data (quantity and quality).

So you absolutely can not take datasets of gene-IQ correlations and assume those correlations would somehow transfer to gene interventions on adults (post training in DL lingo). The genetic contribution to IQ is almost all developmental/training factors (architectural prior, learning algorithm hyper params, value/attention function tweaks, etc) which snowball during training. Unfortunately developmental windows close and learning rates slow down as the brain literally carves/prunes out its structure, so to the extent this could work at all, it is mostly limited to interventions on children and younger adults who still have significant learning rate reserves.

But it ultimately doesn’t matter, because the brain just learns too slowly. We are now soon past the point at which human learning matters much.