I Think We’re Approaching The Bitter Lesson’s Asymptote

I think we’re close to hitting points where improvement in ai in general scenarios will get more and more difficult to achieve over time, and from here on out, each gain in capability will cost exponentially more. I think the concrete evidence I’m using for the claim matters less than the test, but I’ll cover my beliefs anyway.

In many respects, this is a religious belief I hold, one I’ve got in my gut. You might even say I hold this belief as a negative response to the calls of ai doom. I admit my motivated reasoning here, and that it is going to be extremely difficult to dislodge.

Dislodging this belief will take a long time, and evidence presented in favor of the Bitter Lesson not being as dead as the transistor-density aspect of Moore’s Law needs to be interesting and fun to read, on top of being top-notch.


As an experiment, I’m going to swear a lot in the following bits, and the main thing I want feedback on is, was it fun to read?


Again, please note, I’m deliberately using swearing and derisive tone and violence, 4chan-style. No slurs. There are threats to imaginary animals.


If swearing and derision makes you uncomfortable, please skip out.


I’m warning you.


Why the FUCK are we so enamored about fucking ChatGPT? What the fuck does it have that any other fucking model doesn’t? Holy fucking shit, we just keep making “ChatGPT does shiny new thing” my fucking GOD, there’s literally nothing new there that hasn’t existed for at least three fucking years.

Goddamn people making Yann Lecun seem reasonable with this horseshit. If I read yet-another-goddamn fucking “ChatGPT is cool!” or “GPT-3 is sooo awesooomee” paper on arxiv I’m going to drown a fucking cat. I don’t know’s cat. I don’t even have cat! Goddamn, cats don’t deserve this.

Don’t be the reason a cat gets drowned.

No, seriously, what-the-actual fuck is going on? If OpenAI actually had anything actually-fucking-impressive, don’t you think they’d call it fucking GPT-4 if they could have? Of course they-fucking would have made a fucking GPT-4 holy fucking shit. But no, they didn’t.

What the actual fuck are they waiting for?

They just hired a bunch of fucking chumps to label some data and act in a nice way, and a bunch of other inane and totally-ordinary means of data curation. This fucking shit means literally nothing for ai progress, and actually implies making the model bigger has stopped fucking working.

No seriously, think for a fucking second.

Would OpenAI not have done GPT-4 if they could have? The company already fucking loves their headlines. What company wouldn’t-fucking want a headline that ChatGPT was the NEXT GENERATION AI “you’re playing with some goddamn new fucking ai” But no.

Where the fuck is it?

Same with literally any other ai group. Where the fuck is our scaling?


Dead! Dead I say.

“But I’m not dead yet!”

Shut the fuck up, you will be shortly.


(Inb4 Openai releases GPT-4 within the next fucking month, lol)

(I don’t think it’ll pass my fucking rigged test though)


Data Size, FLOPs, GPU RAM, Data Quality, the quadro-fucking-fecta of the fucking Bitter Lesson. Yeah yeah.

I made a Model with 500 Billion Parameters and Not Even A T-Shirt to Show For It. GPT-3.5 gets beaten by 6B models finetuned on a task.

Memory is still pretty fucking garbage.


“Is it real ai yet!?!”

No, goddamn, and these goddamn rules and test are fucking rigged against you.

“Oh, so you’re just gonna play Gary Marcus, Part Two”

No, I’m not a fucking pundit.

So, tired of being nickel-and-dimed by “is it real ai yets”, and not wanting to become yet another ai-pundit, I’m rigging this fucking game so I can claim victory regardless of what fucking happens.

Concretely:

WHERE THE FUCK IS THE FUCKING MAP FOR FUCKING OPEN-ENDEDNESS?

MY AIs ALL FUCKING BREAK DOWN WHEN A SINGLE OUNCE OF OUT-OF-DISTRIBUTION SHIT THREATEN TO BREATHE ON THEM.


Here’s A Goddamn TEST, And It’s Fucking Rigged If You’re Trying To Make Money

The Rules:

  1. No nickel-and-diming- especially on marginal improvements. Nickel-and-diming is evidence in my-fucking-favor.

  2. No marginal hax. Marginal hax- Database lookups, search engines, elasticsearch, in-memory caches, this garbage-ass horseshit are points in my fucking-favor.

  3. Third-party-verifiability: No fucking cheating. Not fucking hidden behind a company’s fucking HTTPS api, the code is audited by the third party, the number of parameters of the ai that’s available and being sold to the public and the entire fucking layout need to be publicly known.

    1. Bonus points if the entire stack for the test needs to run on a third party’s hardware.

  4. Verifiability outside of headline-grabbing- goddamn this test isn’t for your fucking PR department. If the company or person makes the ai for a single demonstration and is verified once and the third party doesn’t get the ability to mess with the ai, and then the ai disappears forever into darkness then that’s a point in my favor.

See what I mean? I’m rigging this whole FUCKING thing!


Cool, that was fun. But I started slipping. I can’t swear a lot for long, goddamn.

Number 3 and 4 are the most important ones. If a third party verifies that the neural network is an entire, self-contained ai, and not a clever, manually-coded lookup table or tied into <elasticsearch>, then I claim defeat.

If it’s behind a private paywall and no third parties can independently verify that the ai exists, then that leaves room for pulling something like what Waze does- where the cars require high-throughput low-latency internet connection 247 and there’s (extreme probability, like 90% confident) an office of people hired to monitor and drive the cars if there’s even a chance of rain or fog.

If the comparison is self-driving cars, if the car requires internet connection in order to make it from Point A, to Point B, those gains are marginal hax. I should be able to rip out any wifi or cellular connections in my car and know for-fucking-sure, that the car will still be able to get me from one end of Canada or another without any cell signal. In the middle winter.

If it’s a single test over a weekend, the ability to just defect in a sanitized, preprogrammed scenario is high. But we can’t verify that for Waze. They own the vehicles, and, again, given the incentives to defect, I wouldn’t consider a single demonstration to be enough.

Again, this test is fuckin’ rigged.


Okay, now you’re probably thinking:

What would make me 100% admit I’m wrong?

If, in 6 years, I can buy/​rent an N GPUs/​clusters, download the weights from huggingface on my desktop, run the model, write down instructions on a piece of paper for the model. Then, keep the model running, have the various instructions performed over time.

Example instructions that I might write down might be:

  1. “In two days, generate an image of banana sushi”

  2. “In an hour, compose a death metal bluegrass remix of We’re Not Flawless”

If the ai can do this- say, follow “in two days” it just needs to happen any time on that target day. Similarly, “in an hour” it just needs to happen any time in that hour, and the logic which checks the time and then decides whether to do the thing needs to happen in the weights.

No hax.

If this is a simple, clever system/​explicit behavior tree that uses regexp and cleverly swaps between Riffusion and Stable Diffusion, it doesn’t sink my theory, but is instead, an example of it.

Regardless of all of the above, The Test Ends against my favor if, in 6 years:

  1. >=40% of currently-employed humans are out of a job due to automation in 6 years

  2. the entire market of working humans in the US took a paycut of >= 20%, and this is traceable back to the inclusion of ai.

  3. Also resolves against my favor if we have our first paperclipper-style close-call in 4 years.

If 6 years rolls around, and no one’s really sure or we can’t come to an agreement, it resolves in my favor.


Thank you for participating in this goddamn-fucking-experiment.

Let me know what you think would improve it.