Occasionally think about topics discussed here. Will post if I have any thoughts worth sharing.
Tomás B.
This seems to be as good of a place as any to post my unjustified predictions on this topic, the second of which I have a bet outstanding on at even odds.
Devin will turn out to be just a bunch of GPT-3.5/4 calls and a pile of prompts/heuristics/scaffolding so disgusting and unprincipled only a team of geniuses could have created it.
Someone will create an agent that gets 80%+ on SWE-Bench within six months.
I am not sure if 1. being true or false is good news. Both suggest we should update towards large jumps in coding ability very soon.
Regarding RSI, my intuition has always been that automating AI research will likely be easier than automating the development and maintenance of a large app like, say, Photoshop, So I don’t expect fire alarms like “non-gimmicky top 10 app on AppStore was developed entirely autonomously” before doom.
After spending several hours trying to get Gemini, GPT-4 and Claude 3 to make original jokes—I now think I may be wrong about this. Still could be RLHF, but it does seem like an intelligence issue. @janus are the base models capable of making original jokes?
Looks to me he’s training on the test set tbh. His ambition to get an IQ of 195 is admirable though.
I very much doubt this will work. I am also annoyed you don’t share your methods. If you can provide me with a procedure that raises my IQ by 20 points in a manner that convinces me this is a real increase in g, I will give you one hundred thousand dollars.
@Veedrac suppose this pans out and custom hardware is made for such networks. How much faster/larger/cheaper will this be?
This is applied to training. It’s not a quantization method.
Hsu on China’s huge human capital advantage:
Returning to Summers’ calculation, and boldly extrapolating the normal distribution to the far tail (not necessarily reliable, but let’s follow Larry a bit further), the fraction of NE Asians at +4SD (relative to the OECD avg) is about 1 in 4k, whereas the fraction of Europeans at +4SD is 1 in 33k. So the relative representation is about 8 to 1. (This assumed the same SD=90 for both populations. The Finnish numbers might be similar, although it depends crucially on whether you use the smaller SD=80.) Are these results plausible? Have a look at the pictures here of the last dozen or so US Mathematical Olympiad teams (the US Asian population percentage is about 3 percent; the most recent team seems to be about half Asians). The IMO results from 2007 are here. Of the top 15 countries, half are East Asian (including tiny Hong Kong, which outperformed Germany, India and the UK).
Incidentally, again assuming a normal distribution, there are only about 10k people in the US who perform at +4SD (and a similar number in Europe), so this is quite a select population (roughly, the top few hundred high school seniors each year in the US). If you extrapolate the NE Asian numbers to the 1.3 billion population of China you get something like 300k individuals at this level, which is pretty overwhelming.If you think AGI will not come in the next 5-10 years then I think there is a very good chance it comes from China. Also, China certainly has the engineering talent to surpass the rest of the world in basically everything, including fabs. Personally, I expect AGI in the new few years though.
GPT-5 with a context window that can fit entire code bases is going to be very scary. Particularly if you think, as I do, that agency is going to start to work soon. I really do think at least “weak recursive self improvement” of the form of automating AI research/training loops is on the table relatively soon.
I would like to register a prediction. I believe a GPT-4-level model that has been RLHFd for humour will be super-human or near superhuman at humour. At least in the 99th percentile of professional comedians. My intuition is humour is much easier than people think, and current models fail at it mostly because the forms of RLHF existing models use pushed them into humourlessness .
Conditional on this being true, he must be very certain we are close to median human performance, like on the order of one to three years. I don’t think this amount of capital can be efficiently expended in the chips industry unless human capital is far less important than it once was. And it will not be profitable, basically, unless he thinks Winning is on the table in the very near term.
I feel 5 trillion must be a misprint. This is like several years worth of American tax revenues. Conditional on this being true I would take this as significant evidence that what they have internally is unbelievably good. Perhaps even an AI with super-persuasion!
It is such a ridiculous figure, I suspect it must be off by at least an OOM.
My take on self-driving taking forever is driving is near AGI complete. Humans drive roughly a million miles between fatal accidents; it would not be particularly surprising if in these million miles (where you are interacting with intelligent agents) you inevitably encounter near AGI-complete problems. Indeed, as the surviving self-driving companies are all moving to end-to-end approaches, self-driving research is begining to resemble AGI research more and more.
I bought index funds. I would say it has the advantage of being robust to AGI not happening, but with birth rates as they are I am not so sure that’s true! If we survive, Hanson’s economic growth calculations predict the economy will start doubling every few months. Provided the stock market can capture some of this, I guess learning how to live on very little (you really want to avoid burning your capital in this future, so should live as modestly as possible both so you can acquire capital and so you can use as little as possible until the market prices in such insane growth) and putting everything in index funds should be fine with even modest amounts of invested capital. However, I doubt property rights will be respected.
Any evidence for it working? Seriously doubt.
Nope. Sadly. And if there were, your intellect would not be impressive for such tools would reach fixation.
If it’s any consolation, all the brilliant people able to make many multiples of your salary due to being born with a better brain—while almost to a man being incredibly smug about it—will soon be losing intellectual death matches with toaster ovens.
And OpenAI has explicitly said this is what they want to do! Their Superalignment strat looks suspiciously like “gunning for RSI”.
It does seem to me a little silly to give competitors API access to your brain. If one has enough of a lead, one can just capture your competitors markets.
I think I may be almost crazy enough to volunteer for such a procedure, ha, should you convince me.
The bet is with a friend and I will let him judge.
I agree that providing an api to God is a completely mad strategy and we should probably expect less legibility going forward. Still, we have no shortage of ridiculously smart people acting completely mad.