This is currently my all-time favorite post; well done, and thank you!
It would be nice for there to be something like this every year, updated with new proposals added and old ones that no longer seem at all viable deleted. And the number of links to further reading would grow as more work is done on each proposal. Building such a thing would be a huge task of course… but boy would it be nice. Maybe it’s something the field as a whole could do, or maybe it could be someone’s paid job to do it. We could point people to it and say “Here is the list of serious proposals; which one do you plan to use? Or do you have a new one to add? Please tell me how it handles inner and outer alignment so I can add it to the list.”
You say this post doesn’t cover all existing proposals—if you were to expand it to cover all (serious) currently existing proposals, how many more entries do you think should be added, roughly?
On the inner alignment problem for amplification with intermittent oversight: Perhaps you’ve already thought of this, but mightn’t there be a sort of cascade of deception, where M at stage t suddenly realizes that its goals diverge from those of the human and resolves to hide this fact, and then when the human asks M(t-1) to examine M(t) and help the human figure out whether M(t) is being deceptive, M(t-1) learns the same revelation and makes the same decision.
As I read I made a diagram of the building blocks and how they were being put together. Seems like transparency tools, relaxed adversarial training, and amplification share first place for prominence and general usefulness.
Agreed! (Well, actually I do have like 10% credence that merely scaling up existing architectures will get us to AGI. But everything else I agree with.)
I think if I were you, then, I would have focused more on how we already knew you could scale up transformers and get more and more impressive results. I had heard of (and maybe skimmed) some of those other papers, so I was already somewhat confident that you could scale up transformers and get more impressive results… but I didn’t quite believe it, deep down. Deep down I thought that probably there was going to be some catch or limitation I didn’t know of yet that would prevent this easy scaling from going on much farther, or leading to anything interestingly new. After all, speculation is easy; making predictions and then later confirming them is hard. Well, now it’s confirmed. This doesn’t change my credences that much (maybe they go from 60% to 90% for the “can we scale up langauge models” and from like 20% to 30% for “are within 5 years of some sort of transformative AI”) but it’s changed my gut.
I feel the need at this point to add that I upvoted this post, even though I disagree with much of it, because this sort of discussion is exactly the sort of thing I like to see on LW, and I thought the OP was a nice detailed criticism of an important paper (and more importantly, criticism of the hype that many people including myself may be feeling after reading it). Again, I ultimately am still hyped, but my hype would be hollow if I didn’t welcome criticisms of it!
Fair enough. I definitely think it’s worth a shot.
I agree with this; this is why I said “Steel weapons.”
Much of your criticism is of the form “This is just a rehash of the GPT-2 paper; it doesn’t teach us anything new.” My reaction to this paper was: “In the GPT-2 paper, they made a prediction: that scaling up the same architecture would lead to more and more impressive and general capabilities. Now they’ve confirmed that prediction.”
This painting shows pikes, or at least spears: https://en.wikipedia.org/wiki/Fall_of_Tenochtitlan#/media/File:The_Conquest_of_Tenochtitlan.jpg
When I said “swords, bows, etc.” i meant the “etc.” to include pikes, spears, javelins, crossbows, etc. -- the usual medieval weaponry.
From what I’ve read so far, it is unclear whether they used the pike square or not. The book hasn’t mentioned any pikes or spears yet, which suggests that they used swords, crossbows, and a few guns (the things the book does mention) but it’s possible that they did use pikes and the historian just didn’t think it worth mentioning. Edit: The book does mention lances on the horsemen.
Their major flaw is that their resolution criteria are pretty vague. But, better than nothing I guess!
Excellent point! Well, they do get the answer right some of the time… it would be interesting to see how often they “remember” to carry the one vs. how often they “forget.” It looks like the biggest model got basically 100% correct on 2-digit addition, so it seems that they mostly “remember.”
To test GPT-3 on another task that is somewhat unusual relative to the typical distribution of text, we collected a set of 374 “SAT analogy” problems [TLBS03]. Analogies are a style of multiple choice question that constituted a section of the SAT college entrance exam before 2005. A typical example is “audacious is to boldness as (a) sanctimonious is to hypocrisy, (b) anonymous is to identity, (c) remorseful is to misdeed, (d) deleterious is to result, (e) impressionable is to temptation”. The student is expected to choose which of the five word pairs has the same relationship as the original word pair; in this example the answer is “sanctimonious is to hypocrisy”. On this task GPT-3 achieves 65.2% in the few-shot setting, 59.1% in the one-shot setting, and 53.7% in the zero-shot setting, whereas the average score amongcollege applicants was 57% [TL05] (random guessing yields 20%). As shown in Figure 3.12, the results improve with scale, with the the full 175 billion model improving by over 10% compared to the 13 billion parameter model.
This seems like a data point in favor of Yudkowsky’s old argument about crossing the human range. I wonder what the standard deviation is for humans answering SAT questions like this; I would guess it is something like 10 percentage points (though probably with a non-normal distribution?) So in this case at least, it looks like all they had to do to get a human-standard-deviation of improvement was add another order of magnitude of compute.
“In addition, inspection of incorrect answers reveals that the model often makes mistakessuch as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather thanmemorizing a table. Overall, GPT-3 displays reasonable proficiency at moderately complex arithmetic in few-shot, one-shot, and even zero-shot settings.” (emphasis mine)
Does this seem right? If so, is this impressive? It seems so to me; people often say “reasoning” is something current methods can’t do, and this is updating me more towards thinking that’s false.
Interesting, thanks! Still though, it’s not like the Roman Empire got taken over by some wandering band of 1,000 men during the plague. My position is not that plagues aren’t important, but rather that they aren’t so overwhelmingly important that the factors I mentioned (tech, cunning/experience) aren’t also very important.
I’ve heard of that happening too, even in movies that have enough historical inaccuracies that I can spot some myself. (Oh, also this happens for scientific inaccuracies, of course.) My guess is that they listen to some of the advice their expert gives them, and ignore the rest, using their judgment to decide which of the advice will boost profits and which won’t. For example, in the police charging Bane thugs scene, probably someone told them that it was stupid for the thugs to stand there until the police got in melee range and stupid for them not to be mowing down hundreds of police, and probably they were like “whatever lol it looks cool.” (Update: Actually the scene was stupider than I remembered; the thugs stopped shooting their guns and counter-charged the police! https://www.youtube.com/watch?v=gCEo7SCvYH4 Also there were way more thugs than I remembered, probably enough to keep the police from ever successfully closing to melee, if they just stayed in a line and fired their guns.)
Note that the historical accuracy case avoids your response about the difficulty of hiring experts. It’s not actually difficult to hire experts to tell you how heavy the flak should be or how many police officers should die or how the thugs should react. (I predict.)
That being said, I might be wrong, and I hope I am! I think we should look for opportunities to make this movie industry change a reality. High risk, high reward, etc.
This. The movie industry has been around long enough, and is diverse enough, that I’d be very surprised if there were million-dollar bills lying around waiting to be picked up like this. It can’t just be that no one in charge ever thought about trying to hire someone to suggest plot-hole fixing tweaks. Plot holes are so easy to find and fix that the best explanation IMO is that finding and fixing them doesn’t actually make money; perhaps it actually loses money.
Analogy: Lots of people on the internet care about historical accuracy. And it is utterly trivial to make movies set in some historical era more historically accurate; you can probably find dozens of history grad students willing to do the job for free. For example: “The flak coming off that aircraft carrier is way too thick; the Japanese relied mostly on CAP for defense. If the flak was that thick, more of the bombers would be dead.” Or: “You want all the officers to charge down the street and engage Bane’s thugs in melee? OK, there should be about 100 or so dead by the time they reach the steps; then the thugs should retreat into the doorway to create a chokepoint.” The reason why this is not done is, obviously, that doing it doesn’t make any money.
Once again, I agree with pretty much everything you say here. I still think you are making the extreme claim—to see this, consider that as far as I can tell my conclusions are justified already by what happened before disease showed up in the Americas. Heck, a lot of the things you are saying here are also support for my conclusions—e.g. the point about coordinated and/or fearless forces being super effective even with inferior tech. Basically, it seems totally true to me that the sort of “small” technological advantage advanced AI might provide, combined with “small” leadership/strategy/diplomacy/coordination advantages, could be super potent.
Perhaps we are talking past each other. On the narrow question of how bad the disease was and how wrecked Aztec (and Inca) society became as a result, I agree that I don’t know much about that and look forward to learning more. Perhaps it was worse than I thought.
Thanks for these careful comments! I think I agree with most of the things you say here, and regret that my post made it seem otherwise.
A few disagreements:
Moreover, the technological advantage of the Spanish was vastly greater than you claim. The Mesoamericans were sophisticated in some respects but entirely lacked metallurgy (or for that matter, domesticated draft and combat animals); obsidian shattered against Spanish and Portuegese steel.
I should have clarified what I meant by “their tech was not that much better.” Obviously if we judge by the results, it was indeed much much better. My point was that if we judge by “on paper” / “In theory” advantages, we’d sorely underestimate the difference. For context, I’m thinking about the argument “Sure, if AI made awesome nanobot swarms while everyone else had only modern tech then it could take over the world. But it wouldn’t be able to do that quickly; AI would be able to make better drones, better rockets, and of course better cyberweapons and sensors, but at least for a few years an AI-designed army wouldn’t look fundamentally different from ordinary human armies. So e.g. if an AI took over North Korea and used it as a power base from which to conquer the world… yeah, it would just be crushed by the combined forces of the USA and China.” My reply to this argument is: “You underestimate how much more powerful ‘better drones, rockets, etc.’ would make an AI-infused North Korean army. On paper, the Spanish army wasn’t substantially different from the Aztec and Inca army; they both were primarily masses of men on foot carrying shields and shooting bows. Yeah, the Spanish had some fancy cannons and horses, but still, it’s not like they had Maxim guns, much less helicopters! Yet this seemingly minor tech advantage of the Spanish turned out to have a huge effect on the battlefield. Similarly I think that the seemingly minor tech advantages AI might bring to its human allies and pawns would probably be, on the battlefield, quite major.”
Also, even if we judge the power of Cortes’ tech by results, such that his advantage was huge, the advantage by itself was not nearly enough to bring him victory. I’m still in the early parts of the book I’m reading now, but it’s clear that Cortes’ entire force would have been wiped out before even entering Aztec territory, in the various battles it fought, if not for some skillful (and lucky) diplomacy. Yes, they could easily stand up to enemy forces many times their number. But they were sufficiently outnumbered that a sustained assault (lasting several days) would totally have worked. No need for change in tactics or tech; a local city-state that really decided to do them in (and refused to listen to his diplomatic overtures, or be scared by his claims of powerful royal spanish backup, not to mention his claims of divine backup) totally could have. And obviously the Aztecs themselves would have had a much easier time of it. And this is all well before the disease came.
especially when your army is already plague-ridden and demoralized (given the havoc already caused). Likewise, the diplomatic resettlement to Spanish hegemony was trivialized when both allies and enemies subsequently succumbed in enormous numbers to disease.
Like I said, I haven’t got to the part where disease shows up. Yet I’ve already seen enough to establish the extreme usefulness of “minor” (in the sense above) tech advantages, and also the extreme usefulness of diplomatic/strategic cunning/experience. I’m at the part where Cortes has almost reached Tenochtitlan for the first time. His force of ~500 men has already won 7 pitched battles against forces 5x-50x larger. More impressively, he’s already got the Aztecs sending him tribute and trying to bribe him to leave, and he’s already got the Tlaxcalans and Cempolans firmly on his side, each one of which was a city-state/region that could have destroyed him if it wanted to, despite his aforementioned military advantage. So, by analogy, it seems that it would totally be possible for a savvy AI-led group of humans (say, a corporation like Google) to take over a minor region of the globe (say, North Korea, or Nigeria, or the UK) based on what I’ve seen already. (Well, what I’ve seen also is just one data point, so it could just be extreme luck. But insofar as the other stories are similar, then that argues against it being extreme luck, only normal luck.)
This simply underestimates the sort of devastating toll on organization and morale that even much more modest absolute casualty figures would have had. Much lower casualty rates can and will fatally disrupt social and military institutions (as with the collapse of Justinian’s efforts to stabilize a broader Roman Empire with a mere 25% death rate); being the last coherent group standing in a sea of utter societal collapse would in fact be an overwhelming advantage for the Spanish even without any unique technological or organizational capabilities.
I think this is a big open question; currently I still disagree. For one thing, the initial plagues weren’t 90% death rate, but more like 50% IIRC. For another… well, when I get to this part in the book I’ll have a better sense of how disorganized they became. My guess is that you are way overestimating it. Remember, I’m not saying the effect was negligible—I’m saying the effect wasn’t so big as to undermine the conclusions I drew about tech and diplomacy advantages. (Or the conclusions I list in the “lessons” section). You are the one making the extreme claim here, as far as I understand you currently.
I would say the analogy here is not at all appropriate and in any case makes assumptions that are not at all definitively known to be true here (i.e. the controlled experiment has not been done- we know MIT professors outproduced xyz demographics in math papers in xyz years even without plague, we do not have a coherent case of a small band of late medieval Iberians laying the foundations for total continental hegemony in regions that didn’t have the confounder of most of the population being disease-naive).
I don’t think I understand this objection yet. Sure, we haven’t done experiments to see what would have happened without the disease. But the analogy is still a good one; it gives us reason so think that probably the Spanish would have got pretty far without the disease. (And, like I said, now that I’m reading the book, it’s clear that they did.)
Moreover, arguments as to the charisma and diplomatic acumen of individual leaders are to an extent appeals to a great man conception that while not essentially wrong must be acknowledged as inherently stochastic because they dependent on idiosyncratic personal capabilities of effective leaders- you are basically making an argument to sample bias whether you’d like to or not I would suggest.
What is the great man conception, why is it bad, and why is it what I’m doing? Anyhow, I agree that e.g. Cortes and Malinche were unusually capable people, it seems. But this is fine for my purposes, because I’m trying to draw analogies to AI—in particular, to smarter-than-human AI. AI which is not at least as capable as Cortes and Malinche is not the sort of AI I am worried about.
History is plenty affected by such but there were also Iberian new world expeditions that lead to little or nothing in terms of coherent gains (i.e. Ponce De Leon, who failed and perished in conflict with natives in attempting to establish a Spanish colony in Florida within a few decades of Cortez); if you want to refute that it’d take a careful catalog and analysis of every militarized new world expedition, not just the ones that became runaway successes.
Yes, I’d love to know more about those failed expeditions. My current rough guess is that for every “success” there were between one and two “failures” of similar magnitude (e.g. similar initial investment of resources). Moreover the two biggest and most powerful American civilizations both fell to the Spanish on the first attempt, so in some sense their score is 2⁄2. Unless I’m wrong by orders of magnitude about this, it seems that the Spanish had more than luck on their side; it seems that we should be reasonably worried about an AI with similar advantages over us as the Spanish had over the Americans.