tailcalled
I think it’s not cheating in a practical sense, since applications of AI typically have a team of devs noticing when it’s tripping up and adding special handling to fix that, so it’s reflective of real-world use of AI.
But I think it’s illustrative of how artificial intelligence most likely won’t lead to artificial general agency and alignment x-risk, because the agency will be created through unblocking a bunch of narrow obstacles, which will be goal-specific and thus won’t generalize to misalignment.
Were these key things made by the AI, or by the people making the run?
In retrospect: fewer and fewer people have been working on this over time. Getting back on track is probably not feasible to happen: https://www.lesswrong.com/posts/puv8fRDCH9jx5yhbX/johnswentworth-s-shortform?commentId=jZ2KRPoxEWexBoYSc
It’s often not just that they endorse a single belief, but also rather that they have a whole psychodrama and opinions on the appropriate areas of investigation. [Joseph Bronski](https://x.com/BronskiJoseph/status/1917573847810990210) exemplifies this when taken to an extreme.
And I wouldn’t say that the real answer is to have their belief not be considered racist. I’d say the real reason is something like, they want to fight back against anti-racists. [Arthur Jensen probably described the best what they’re trying to fight against](https://emilkirkegaard.dk/en/2019/04/a-kind-of-social-paranoia-a-belief-that-mysterious-hostile-forces-are-operating-to-cause-inequalities-in-educational-and-occupational-performance-despite-all-apparent-efforts-to-eliminate-prejudi/).
Sometimes when discussing a controversial issue, people are kind of avoiding the most important points within it, and then it feels relevant to ask them what their interest in discussing it is. Their true interest will often be to control the political discourse away from some dynamic they perceive as pathological, but if they explained that, they would have to argue that the dynamic is pathological, which they often don’t want to do because they risk triggering the dynamic that way. As a diversion, they will sometimes say that their motivation is to seek truth.
I think you see this a lot with racism/sexism-type stuff, where racists/sexists dissociate from the fact that they are racist/sexist in order to square their sense that their perspective ought to be considered with the common norm against sexism/racism. And then they consider enforcement of said norm to be pathological, but they’re too dissociated/afraid to explain how, so they say they just care about the truth, even though their dissociation often prevents them from properly investigating said truth.
(Not sure if by “runtime” you mean “time spent running” or “memory/program state during the running time” (or something else)? I was imagining memory/program state in mine, though that is arguably a simplification since the ultimate goal is probably something to do with the business.)
Potentially challenging example: let’s say there’s a server that’s bottlenecked on some poorly optimized algorithm, and you optimize it to be less redundant, freeing resources that immediately gets used for a wide range of unknown but presumably good tasks.
Superficially, this seems like an optimization that increased the description length. I believe the way this is solved in the OP is that the distributions are constructed in order to assign an extra long description length to undesirable states, even if these undesirable states are naturally simpler and more homogenous than the desirable ones.
I am quite suspicious that this risks you end up with improper probability distributions. Maybe that’s OK.
Writing the part that I didn’t get around to yesterday:
You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It’d be a massive technical challenge of course, because atoms don’t really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.
This doesn’t really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can’t assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.
To reverse-engineer people in order to make AI, you’d instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.
However, there’s just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there’s lots of reason to think humans are primarily adapted to those.
One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.
The above is similar to how we don’t worry so much about ‘website misalignment’ because generally there’s a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn’t have to be true, in the sense that there are many short programs with behavior that’s not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don’t know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.
(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won’t lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)
After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
I’ve grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it’s much more powerful than individual intelligence (whether natural or artificial).
Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn’t meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).
Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution’s information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.
(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, … . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there’s also often subniches.)
And then obviously beyond these points, individual intelligence and evolution focus on different things—what’s happening recently vs what’s happened deep in the past. Neither are perfect; society has changed a lot, which renders what’s happened deep in the past less relevant than it could have been, but at the same time what’s happening recently (I argue) intrinsically struggles with rare, powerful factors.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don’t have any good way of knowing which of these are the important ones or not.
You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)
The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate “small-scale” understanding (like an autoregressive convolutional model to predict next time given previous time) into “large-scale” understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I’ve studied a bunch of different approaches for that, and ultimately it doesn’t really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)
If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn’t develop.
Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don’t want money tied up into durability or strength that you’re not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent—and as a consequence, those people would then gain more agency.)
Also, I do get the impression you are overestimating the feasibility of ““durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern”. I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it’s relatively far from falling naturally out of the methods.
One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.
(I should maybe write more but it’s past midnight and also I guess I wonder how you’d respond to this.)
If there’s some big object, then it’s quite possible for it to diminish into a large number of similar obstacles, and I’d agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.
However, my assertion wasn’t that intelligence cannot handle almost all obstacles, it was that consequentialism can’t convert intelligence into powerful agency. It’s enough for there to be rare powerful obstacles in order for this to fail.
“Stupidly obstinate” is a root-cause analysis of obstinate behavior. Like an alternative root cause might be conflict, for instance.
At first glance, your linked document seems to match this. The herald who calls the printer “pig-headed” does so in direct connection with calling him “dull”, which at least in modern terms would be considered a way of calling him stupid? Or maybe I’m missing some of the nuances due to not knowing the older terms/not reading your entire document?
“Pigheaded” is not a description of behavior, it’s a proposed root cause analysis. The idea is that pigs are dumb so if someone has a head (brain) like a pig, they might do dumb things.
It seems like you are trying to convince me that intelligence exists, which is obviously true and many of my comments rely on it. My position is simply that consequentialism cannot convert intelligence into powerful agency, it can only use intelligence to bypass common obstacles.
I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.
We don’t just use intelligence.
On the other hand, words like “durability” imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.
???
Vaporization is prevented by outer space which drains away energy.
Not clear why you say durability implies intelligence, surely trees are durable without intelligence.
Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t.
I think the necessity of intelligence for tradition exists on a much more fundamental level than that. Intelligence allows people to from an extremely rich model of the world with tons of different concepts. If one had no intelligence at all, one wouldn’t even be able to copy the traditions. Like consider a collection of rocks or a forest; it can’t pass any tradition onto itself.
But conversely, just as intelligence cannot be converted into powerful agency, I don’t think it can be used to determine which traditions should be copied and which ones shouldn’t.
There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.
It seems to me that you are treating any variable attribute that’s highly correlated across generations as a “tradition”, to the point where not doing something is considered on the same ontological level as doing something. That is the sort of ontology that my LDSL series is opposed to.
I’m probably not the best person to make the case for tradition as (despite my critique of intelligence) I’m still a relatively strong believer in equillibration and reinvention.
I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?
Whenever there’s any example of this that’s too embarrassing or too big of an obstacle for applying them in a wide range of practical applications, a bunch of people point it out, and they come up with a fix that allows the LLMs to learn it.
The biggest class of relevant examples would all be things that never occur in the training data—e.g. things from my job, innovations like how to build a good fusion reactor, social relationships between the world’s elites, etc.. Though I expect you feel like these would be “cheating”, because it doesn’t have a chance to learn them?
The things in question often aren’t things that most humans have a chance to learn, or even would benefit from learning. Often it’s enough if just 1 person realizes and handles them, and alternately often if nobody handles them then you just lose whatever was dependent on them. Intelligence is a universal way to catch on to common patterns; other things than common patterns matter too, but there’s no corresponding universal solution.
I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)
You ran way deeper into the “except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception” point than I meant you to. My main point is that humans have grounding on important factors that we’ve acquired through non-intelligence-based means. I bring up the possibility of copying other’s conclusions because for many of those factors, LLMs still have access to this via copying them.
It might be helpful to imagine what it would look like if LLMs couldn’t copy human insights. For instance, imagine if there was a planet with life much like Earth’s, but with no species that were capable of language. We could imagine setting up a bunch of cameras or other sensors on the planet and training a self-supervised learning algorithm on them. They could surely learn a lot about the world that way—but it also seems like they would struggle with a lot of things. The exact things they would struggle with might depend a lot on how much prior your build into the algorithm, and how dynamic the sensors are, and whether there’s also ways for it to perform interventions upon the planet. But for instance even recognizing the continuity of animal lives as they wander off the screen would either require a lot of prior knowledge built in to the algorithm, or a very powerful learning algorithm (e.g. Solomonoff induction can use a simplicity prior to infer that there must be an entire planet full of animals off-screen, but that’s computationally intractable).
(Also, again you still need to distinguish between “Is intelligence a useful tool for bridging lots of common gaps that other methods cannot handle?” vs “Is intelligence sufficient on its own to detect deception?”. My claim is that the the answer to the former is yes and the latter is no. To detect deception, you don’t just use intelligence but also other facets of human agency.)
I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?
First, some things that might seem like nitpicks but are moderately important to my position:
In many ways, our modern world is much less heterogeneous than the past. For instance thanks to improved hygeine, we are exposed to far fewer diseases, and thanks to improved policing/forensics, we are exposed to much less violent crime. International trade allows us to average away troubles with crop failures. While distribution shifts generically should make it harder for humans to survive, they can (especially if made by humans) make it easier to survive.
Humans do not in fact survive; our average lifespan is less than 100 years. Humanity as a species survives by birthing, nurturing, and teaching children, and by collaborating with each other. My guess would be that aging is driven to a substantial extent by heterogeneity (albeit perhaps endogenous heterogeneity?) that hasn’t been protected against. (I’m aware of John Wentworth’s ‘gears of aging’ series arguing that aging has a common cause, but I’ve come to think that his arguments don’t sufficiently much distinguish between ‘is eventually mediated by a common cause’ vs ‘is ultimately caused by a common cause’. By analogy, computer slowdowns may be said to be attributable to a small number of causes like CPU exhaustion, RAM exhaustion, network bandwidth exhaustion, etc., but these are mediators and the root causes will typically be some particular program that is using up those resources, and there’s a huge number of programs in the world which could be to blame depending on the case.)
We actually sort of are in a precarious situation? The world wars were unprecedentedly bloody. They basically ended because of the invention of nukes, which are so destructive that we avoid using them in war. But I don’t think we actually have a robust way to avoid that?
But more fundamentally, my objection to this question is that I doubt the meaningfulness of a positive theory of how humans survive and thrive. “Intelligence” and “consequentialism” are fruitful explanations of certain things because they can be fairly-straightforwardly constructed, have fairly well-characterizable properties, and even can be fairly well-localized anatomically in humans (e.g. parts of the brain).
Like one can quibble with the details of what counts as intelligence vs understanding vs consequentialism, but under the model where intelligence is about the ability to make use of patterns, you can hand a bunch of data to computer scientists and tell them to get to work untangling the patterns, and then it turns out there are some fairly general algorithms that can work on all sorts of datasets and patterns. (I find it quite plausible that we’ve already “achieved superhuman intelligence” in the sense that if you give both me and a transformer a big dataset that neither of us are pre-familiar with to study through, then (at least for sufficiently much data) eventually the transformer will clearly outperform me at predicting the next token.) And probably these fairly general algorithms are probably more-or-less the same sort of thing that much of the human brain is doing.
Thus “intelligence” factors out relatively nicely as a concept that can be identified as a major contributor to human success (I think intelligence is the main reason humans outperformed other primates). But this does not mean that the rest of human success can equally well be factored out into a small number of nicely attributable and implementable concepts. (Like, some of it probably can, but there’s not as much reason to presume that all of it can. “Durability” and “strength” are examples of things that fairly well can, and indeed we have definitely achieved far-superhuman strength. These are purely physical though, whereas a lot of the important stuff has a strong cognitive element to it—though I suspect it’s not purely cognitive...)
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns,
I guess to add, I’m not talking about unknown unknowns. Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can’t efficiently be derived from empirical data (except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception).
Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.
I don’t have time to read this study in detail until later today, but if I’m understanding it correctly, the study isn’t claiming that neural networks will learn rare important patterns in the data, but rather that they will learn rare patterns that they were recently trained on. So if you continually train on data, you will see a gradual shift towards new patterns and forgetting old ones.
I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.
Random street names aren’t necessarily important though? Like what would you do with them?
Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.
I didn’t say that intelligence can’t handle different environments, I said it can’t handle heterogenous environments. The moon is nearly a sterile sphere in a vacuum; this is very homogenous, to the point where pretty much all of the relevant patterns can be found or created on Earth. It would have been more impressive if e.g. the USA could’ve landed a rocket with a team of Americans in Moscow than on the moon.
Also people did use durability, strength, healing, intuition and tradition to go the moon. Like with strength, someone had to build the rockets (or build the machines which built the rockets). And without durability and healing, they would have been damaged too much in the process of doing that. Intuition and healing are harder to clearly attribute, but they’re part of it too.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.
Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that’s less clear (and possibly not simple enough to be assembled manually, idk).
Margins of error and backup systems would be, idk, caution? Which, yes, definitely benefit from intelligence and consequentialism. Like I’m not saying intelligence and consequentialism are useless, in fact I agree that they are some of the most commonly useful things due to the frequent need to bypass common obstacles.
People read more into this shortform than I intended. It is not a cryptic reaction, criticism, or reply to/of another post.
Ah, fair enough! I just thought given the timing, it might be that you had seen my post and thought a bit about the limitations of intelligence.
I don’t know what you mean by intelligent [pejorative] but it sounds sarcarcastic.
The reason I call it intelligent is: Intelligence is the ability to make use of patterns. If one was to look for patterns in intelligent political forecasting and archaeology, or more generally patterns in the application of intelligence and in the discussion of the limitations of intelligence, then what you’ve written is a sort of convergent outcome.
It’s [perjorative] because it’s bad.
To be clear, the low predictive efficiency is not a dig at archeology. It seems I have triggered something here.
Whether a question/domain has low or high (marginal) predictive effiency is not a value judgement, just an observation.
I mean I’m just highlighting it here because I thought it was probably a result of my comments elsewhere and if so I wanted to ping that it was the opposite of what I was talking about.
If it’s unrelated then… I don’t exactly want to say “carry on” because I still think it’s bad, but I’m not exactly sure where to begin or how you ended up with this line of inquiry, so I don’t exactly have much to comment on.
You can make highly general AIs, they will just lack agency. You then plop a human on top of the AI and the human will provide plenty agency for basically all legible purposes.