How much time we have with transformers?
As I heard the current discourse about scaling laws, is that we trained the models on the entire (textual) internet, and still haven’t found superintelligence. They started to use video and audio data, but these sources are less meaning-dense (encoding a picture of a room takes as much data as two books of of characters), so we could be at the edge of scaling unless a new, more data-efficient model arises.
Let’s assume humans can generate 40 bit/sec meaningful data. Estimates of information density in natural spoken languages converges around this number cross-culturally, so it could be a natural processing limit in the brain. Quarter-third of the day is spent sleeping, and not every interaction is captured (yet. Distopian surveillence will increse it, but) for now let’s take the half of it for an avarege day. 20 bit/sec * 86400 sec/day ≈ 210 KB/day/human. Population of 8G means 1640 TB/day. A post I found with DuckDuckGo says 402M TB/day is created on the net, which would be a 0.4% mening-to-data efficiency, and given a large amount of it is video, it is beliveable. GPT-2 was released in 2020, so let’s take 5 years, this would be 2.8 EB of meaning, and given a token is 16 bit, this would be 1.4E of tokens in the past five years.
Plugging it into the Chinchilla scaling it would mean an 0.00048 loss-floot-contribution, which would be insignificant compared to the ideal generative model, which has a 1.87 lossfloor. D/N=20 means 700P of paramaters, about 1000 times larger than current models. Let’s assume a 3.7x increase/year, it would mean at the end of transformers will be reached in 5-6 years. But this is some back-of-the napkin calculation, I think it’s more like an upper-bound.
Ratio of profits
Marx’s theory of capital was focused on the ratio of profits, not on their mass. Attacking it by saying we have so much good under capitalism does not disprove the theory, because it is simply a statement about the mass of goods.
Marx said that under capitalism companies maximise profitrate. (Not profit!) Suppose there are two strategies:
Strategy A gives a worker 200 dollars, and they produce 500 worth of goods.
Strategy B gives a worker 500 dollars, and they produce 1000 worth of goods.
Strategy A will win, because it will give 1.5 dollars for each dollar of investment, while strategy B only nets 1 dollar for each dollar. The mass of profits is higher in the B case, but the focus is on return of investment.
Usually, automatization and technology improvements rises profitrate and revenue at the same time, and the life of the avarege worker improves.
Yet, independent of the total mass of revenue, the question about the ratio of distribution remains. If the ratio of wages increase, the ratio of profit would decrease, therefore the owners should want to lower the wage-ratio, while workers should want to increase it. And this is the key moment in the class struggle.