I’m an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. I’m also at: Substack, X/Twitter, Bluesky, RSS, email, and more at this link. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Leave me anonymous feedback here.
Steven Byrnes
Most things in biology are on a spectrum, I would be surprised of psychopathy is not one of those.
One way to think of it is: there’s a spectrum of how Person A cares about Person B, and this spectrum goes from positive (compassion, desire to help) to neutral (callous indifference) to negative (schadenfreude, desire to pick a fight).
So “it’s a spectrum” is not in itself an argument for optimism here. (Or sorry if I’m misunderstanding.)
I maybe should write a general post about “why I don’t believe in most neat psychopathologies”. I do really wish this field of study was higher quality, and maybe I should do a deep dive and form a more consistent opinion on this…
In case it helps, my take on the psychopathy literature is mostly the same as it was 3 years ago when I wrote this comment.
Everyone agrees sunburns are bad, and so if someone is in a situation where the only way they can avoid sunburns is sunscreen, then they should obviously use sunscreen. That’s what I had in mind when I wrote my post,
but maybe I’ll tweak the wording to make it clearerUpdate: I have now added a little addendum making that explicit:ADDENDUM APRIL 16: I should clarify that for some people in some situations (apparently white people in Australia are often in this category), it might be the case that your body is simply incapable of developing enough of a tan to avoid getting a sunburn. If so, then you should obviously wear sunscreen! At the end of the day, if you’re getting sunburns, then whatever you’re doing is the wrong thing to do, and you should do something different. Sunburns are bad.
Thanks for all this great info! Alas, I am not convinced.
Seems like the two biggest cruxes between us are:
(1) you see tanning beds as strong evidence about actual suntans whereas I see it as merely suggestive. I see it as plenty plausible that getting a tan on a tanning bed is just a different thing from getting a tan in real sunlight. E.g. tanning beds have 10× more intense UV but almost no visible light, and even within UV the precisespectra are presumably somewhat different, etc. I acknowledge that I don’t have a detailed mechanistic story here, just an open-mindedness to the possibility that such a story exists. So that might be hard to resolve. So let’s move onto the other big crux for me:
(2) I really do get a lot of my confidence in “sun-tans are basically fine” theory from the fact that plenty of white people in the USA are outside a ton and simply never wear sunscreen. And when there’s a decent-sized subpopulation that has a much, much higher exposure to an important carcinogen than the median person, then that should be just blindingly obvious to everyone. Like it is with smoking. If you look at a group of lung cancer patients, then most of them will be (current or former) chain smokers, even though chain smokers are a pretty small fraction of the population. And I really don’t think melanoma patients are like that. Like, I don’t think it’s the case that most melanoma patients are members of the subpopulation “white people with outdoor jobs who never wear sunscreen”. If that were the case, everyone would have noticed by now; it would be screaming out of the statistics, indeed you wouldn’t even need statistics to see it. We can argue about these little 20% effects or whatever, but we don’t see “white outdoor-working sunscreen-abstainers” screaming out of the statistics, crowding the melanoma wards, and getting skin cancers at 10× or 50× (or whatever) the rate of the white population median.
Brynes offered the confusing statement that because there are confounders, he is rounding 20% to zero. But confounders go both ways. The study could be vastly underestimating melanoma risk associated with agricultural work, for example because workers are physically healthier or of darker complexion than controls. The data is so noisy it’s close to useless for estimating the risks of tanning.
Oh sorry, I think you misunderstood, I was not rounding 20% to zero because I expect that the 20% is overstated due to confounders, rather I am rounding to zero because a factor of 1.2× is very much closer to a factor of 1× than to a factor of 10× or 50× or whatever. Again, my stance is that a real and decision-relevant effect of suntans-without-sunburns would be screaming out of the data, blindingly obvious to everyone, the way that smoking-vs-lung-cancer is, because some people get much, much more sunscreen-free sun exposure than the median.
In general, you should not expect linear dose-response relationships between carcinogen exposure and cancer risk. As an example to build intuition, the excess incidence of lung cancer in smokers is approximately “proportional to the fourth power of smoking duration multiplied by the number of cigarettes smoked per day.” The reasons why are complicated and depend heavily on the etiology of the cancer in question.
The fact that people who get 50x more sun exposure than you don’t have 50x higher rates skin cancer does not mean your current level of sun exposure is not significantly contributing to your cancer risk.
As I understand it, dose-response curves for carcinogens are usually linear or concave-up, as in your smoking example, whereas your suggestion here seems to be that it might be concave-down, which I think would be very strange. After some guy has been working outside every day without sunscreen for 10 years, now it’s 10 years + 1 day, and he goes outside without sunscreen as usual, and … what? The UV light no longer causes as much CPDs?? The CPDs no longer cause as much melanoma?? That doesn’t make any sense, right? Or what? Sorry if I’m misunderstanding.
Green et al. (2011) conducted a 10-year study (n=1621) in Nambour, a town in Queensland, Australia. Participants were randomly assigned to daily or discretionary sunscreen application. After 10 years, 11 melanomas were found in the daily group and 22 in the discretionary group.
Cool study, thanks! But we don’t know whether that’s mediated by tans vs burns. (Again, everyone agrees that sunburns are bad.) The paper talks about how many sunburns people had at baseline but not during the study period (unless I missed it).
I know I didn’t respond to everything, I hope to keep reading and commenting when I get a chance (might be a couple weeks though…). Thanks again, these are great resources, and I’m looking forward to doing a second more careful pass through this post! It also sounds like you caught some unambiguous errors I made, so I’ll want to fix those. I figured I’d comment anyway just to share my quick response in the meantime.
There’s further discussion here including the comment section. No one found a smoking gun, but one of the two authors was also lead author on this other paper that is obvious BS.
I think of “brain-like AGI” as a threat model, not a plan. And then a question is whether the threat model is plausible vs far-fetched, and I guess you’re saying that my argument for “it’s plausible” is solid but I’m not communicating it clearly?
Nominally, my argument for “it’s plausible” is back in §1.5 (“What’s the probability that we’ll eventually wind up with brain-like AGI?”), and it sounds like you’re in “Opinion #4” camp. (“Opinion #4: “Brains are SO complicated—and we understand them SO little after SO much effort—that there’s just no way we’ll get brain-like AGI even in the next 100 years.”) I could flesh out my answer to Opinion #4, e.g. by adding three one-sentence bullet points that follow the three “brain complexity is easy to overstate” slides here. Hmm, I’ll think about it. The post is already quite long.
Sure, but that could still be consistent with “sunburns bad, suntans fine” theory, I figure. Maybe even if our ancestors were outside all the time, they would still sometimes lose their tan during a cloudy week and get sunburned?
I’m definitely open to the possibility that e.g. people of Scandinavian descent living in Nairobi simply cannot accommodate to the UV exposure by tanning, i.e. even if they are as tanned as they can possibly be, they’ll still get burned, there’s just too much UV. If so, then sunscreen (+ clothes, shade, etc.) is their only option to avoid sunburns, and again everyone agrees that sunburns are bad, both immediately and long-term.
I don’t understand your comment. Getting sunburned is obviously a threshold thing, right? If I get 5× more sun than normal on some day, I don’t get a 5× bigger sunburn, instead I go from no sunburn whatsoever to yes sunburn.
A tan being SPF 2–4 sounds reasonable, see §2.3.
Yup
It is also probably worth noting
Well I did note it, see footnote 4 :-)
Oh lol I have good vision so forgot that glasses are a thing. Oops. Just added a footnote, thanks.
My impression (I’m open to correction) is that studies have tried to tease out the influence of inside-vs-outside from time-spent-looking-at-books-and-screens, and they reliably find that the former makes a big difference in myopia, while the latter makes no difference at all (once you control for the former).
There’s some proposed mechanism involving bright light increasing dopamine in the retina, which in turn impacts myopia via [not sure what the pathway is] (example). I don’t know how strong the evidence is for that mechanism.
Anyway, if you or anyone else tries to make sense of this literature, I’d be interested in what you learn. :-)
Nah, my model allows ASI without massive compute at any point in the process, see “Foom & Doom 1: ‘Brain in a box in a basement’” (esp. §1.3), and maybe also “The nature of LLM algorithmic progress” §4.
I’ll reiterate what I wrote before: “No matter how many tokens are appended to the end of the CoT, you still have the issue that, each time you do a new forward pass, the LLM looks at its context window (textbooks + CoT scratchpad) ‘with fresh eyes’, and what it sees is a bunch of unintelligible gobbledygook that it has only the duration of one forward pass to make sense of.”
Probably the linear algebra textbooks in the context window already say that “you can think of a matrix as a linear transformation… <more explanation>”, right?
And this points to a key idea: The CoT-so-far in the context window is not a fundamentally different kind of thing from the textbooks in the context window. It’s just more tokens.
So we can consider the “textbooks + CoT-so-far” as a kind of “extended textbook”. And the LLM has one forward pass to read that “extended textbook” and then output a useful token. And that token will probably not be useful if the LLM does not understand (the relevant part of) linear algebra.
Granted, some textbooks are better than other textbooks. But I don’t think there exists any linear algebra textbook (or “extended textbook”) that gets around the “understanding linear algebra requires more serial steps than there are in a forward pass” problem (i.e., you can’t understand eigenvectors without first understanding matrix multiplication etc.). So CoT doesn’t help. A CoT-in-progress is just a different possible context window. And my claim is that there is no possible context window that can explain eigenvectors within a single forward pass to an LLM that has never seen any linear algebra.
(Again, this is a very different situation from a human writing down notes.)
You’re right, thanks, I have now edited that paragraph to also talk about how Thought Assessors might fit in.
a big reason why human brain algorithms are so efficient is that evolution did a lot of precomputation beforehand, so a large part of the learning is already done, pre-computed in the genome
I disagree for reasons discussed here.
My understanding (you can correct me) is that information can never travel from later layers to earlier layers, e.g. information cannot travel from token location 12 at layer 7 to token location 72 at layer 4. Right? So that means:
layer 1 can use information in the weights + raw token values
layer 2 can use information in the weights + stuff that was figured out in (earlier & current token positions of) layer 1
layer 3 can use information in the weights + stuff that was figured out in (earlier & current token positions of) layers 1 & 2
Etc. Right?
This is the sense in which I was saying that the linear algebra textbook is gobbledygook. Layer 1 starts from scratch, then layer 2 has to build on only layer 1, etc.
It’s true that different token-positions in layer 1 can be figuring out multiple things in parallel. But I claim that some things really need to be understood serially. I don’t expect any part of the architecture to be able to make meaningful progress towards understanding eigenvectors, if it doesn’t ALREADY know something about matrices, and matrix multiplication, etc., from previous layers.
So I claim the number of layers imposes a bottleneck on serial steps, and that this is a meaningful bottleneck on parsing interrelated concepts that are not in the weights, such as linear algebra in this thought-experiment.
How does that relate to what you wrote?
Hmm, good point, I guess I was a bit sloppy in jumping around between a couple different things that I believe, instead of keeping the argument more tight and precise.
One thing I believe is: “LLMs are predominantly powered by imitation learning”. I didn’t argue for that in the post, but my argument would be basically this comment + one more paper along the same lines + “Most Algorithmic Progress is Data Progress” (+ further discussion in The nature of LLM algorithmic progress §1.4). I don’t feel super-duper strongly and am not defining what “predominantly” means here in any case.
Another thing I believe is: “You can’t imitation-learn how to continual-learn”. This is independent of how to think about LLM post-training. I regularly come across people who disagree, on a conceptual level, with this claim, so it seemed worth sharing. Indeed, I now know that there’s a whole little subfield of “algorithm distillation” and “in-context RL”, and my claim (having now read three such papers, see other comments on this page) is that this whole subfield is a dumpster fire where the big idea doesn’t really work but people keep publishing misleading importance-hacked papers anyway.
Another thing I believe is: “You can’t meta-learn how to continual-learn”, which is a more general claim because it includes RLVR. This stronger claim is actually what follows from the boldface sentence in the post: “The only practical way to know what happens after millions of steps of some scaled-up continual learning algorithm is to actually do millions of steps of that same scaled-up continual learning algorithm, with actual weights getting actually changed in specifically-designed ways via PyTorch code. And then that’s the scaled-up learning algorithm you’re running. Which means you’re not doing [some different learning algorithm].”
Another thing I believe is: “If you put lots of interrelated complex concepts, none of which appear anywhere in the pretraining data, into the context window, then LLMs would crash and burn; rather, the only way for an LLM to fluently use a set of concepts is for all (or at least almost all) of those concepts to be in the weights, not the context window, because they were already used properly a bunch of times in the training data.” I allude to that in the post and elaborate on it in this other comment. This implies that context windows and scratchpads cannot substitute for weight-updates, and that a “country of geniuses in a datacenter” (who would presumably be inventing entirely new fields of science etc.) cannot consist of LLMs with very long context windows in a sealed box for the equivalent of 100 years with no human intervention.
Another thing I believe is: there’s no way to close the loop such that a closed system of LLMs can come up with new useful concepts and get those concepts into their own weights, e.g. open-ended self-distillation setups won’t work on LLMs. But that’s definitely off-topic for this OP. Self-distillation setups would be a bona fide continual learning algorithm, by the standards of this OP, e.g. there’s PyTorch code for continual weight updates. Whether that setup would actually work in practice, and how far it would go, are a different question.
So the main points of this OP are basically 2, 3, and 4, which are all pretty related. Plus the stuff about how to think about continual learning in general.
You’re treating the Michael Druggan quote at the very end as obviously terrible, whereas I see it as obviously sensible. I’m confused. Maybe I’m missing some context? Are you reading in a subtext that Druggan wants the superintelligences to exist, instead of conditioning on the superintelligences existing and talking about what that world would or should be like?
If we assume that the Singularity has happened, and that radical superintelligence exists, and (for the sake of argument) that humans still exist too, … then your position is that humans should still be making consequential decisions about post-Singularity economic policy, legal frameworks, etc.? Really?
Hmm, thinking about it more, I can imagine good-seeming futures in which e.g. there’s a Singleton ASI which enforces hard boundaries (especially against creating other ASIs), but allows lots of human agency within those boundaries (cf. Long Reflection, or Archipeligo, or Nanny AI, or Fun Theory Utopia, etc.). But I wouldn’t exactly call that “humans remain in control”. Or at least, it’s not a central example of that. What other options are there, assuming ASI exists at all?
In any multipolar ASI scenario, the economy and world would presumably be changing at ASI speed, and having excruciatingly slow humans “in control” seems unworkable.
For example the IAEA has heavily curtailed research into how to build nuclear weapons more cheaply and efficiently, which seems like it applies pretty straightforwardly to algorithmic progress.
IIUC, it’s legal everywhere on Earth to do basic research that might eventually lead to a new, much more inexpensive and hard-to-monitor method to enrich uranium to weapons grade.
I’m thinking mainly of laser isotope enrichment, which was first explored in the 1970s. No super-inexpensive method has turned up, thankfully. (The best-known approach seems to be in the same ballpark as gas centrifuges in terms of cost, specialty parts etc., or if anything somewhat worse. Definitely not radically simpler and cheaper.) But I think there’s a big space of possible techniques, and meanwhile people in academia keep inventing new types of lasers and new optical excitation and separation paradigms. I don’t think there’s any general impossibility proof that kg-scale uranium enrichment in a random basement with only widely-available parts can’t ever get invented someday by this line of research.
(If it did, it probably wouldn’t be the death of nonproliferation because you can still try to monitor and control the un-enriched uranium. But it would still make nonproliferation substantially harder. By the way, once you have a lot of weapons grade uranium, making a nuclear bomb is trivial. The fancy implosion design is only needed for plutonium bombs not uranium.)
AFAICT, if someone is explicitly developing a system for “kg-scale uranium enrichment via laser isotope separation”, then the authorities will definitely go talk to them. But for every step prior to that last stage, where you’re doing “basic R&D”, building new types of lasers, etc., then my impression is that people can freely do whatever they want, and publish it, and nobody will ask questions. I mean, it’s possible that there’s someone in some secret agency who is on the ball, five steps ahead of everyone in academia and industry, and they know where problems might arise on the future tech tree and are ready to quietly twist arms if necessary. But I dunno man, that seems pretty over-optimistic, especially when the research can happen in any country.
My former PhD advisor wrote a book in the 1980s with a whole chapter on laser isotope separation techniques, and directions for future research. The chapter treats it as completely unproblematic! Not even one word about why this might be bad. I remember feeling super weirded out by that when I read it (15 years ago), but I figured, maybe I’m the crazy one? So I never asked him about it.
(Low confidence on all this.)
Whoops, the wikipedia article was deleted a few months ago.
I meant “kinda the same idea” in the sense that, at the end of the day, a similar problem is being solved by the communicative signal. I agree that there’s a sign-flip.
Anyway, I’ll reword, thanks.
Why not google docs?