Steven Byrnes comments on johnswentworth’s Shortform

Steven Byrnes 18 Apr 2025 18:50 UTC
2 points
0
OK, here’s my argument that, if you take {intelligence, understanding, consequentialism} as a unit, it’s sufficient for everything:
- If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
  - Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
- If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
  - After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
  - If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
- If reducing heterogeneity is helpful, then {intelligence, understanding, consequentialism} can discover that fact, and figure out how to reduce heterogeneity.
- Etc.
- tailcalled 20 Apr 2025 11:06 UTC
  3 points
  0
  Parent
  Writing the part that I didn’t get around to yesterday:
  You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It’d be a massive technical challenge of course, because atoms don’t really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.
  This doesn’t really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can’t assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.
  To reverse-engineer people in order to make AI, you’d instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.
  However, there’s just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there’s lots of reason to think humans are primarily adapted to those.
  One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.
  The above is similar to how we don’t worry so much about ‘website misalignment’ because generally there’s a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn’t have to be true, in the sense that there are many short programs with behavior that’s not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don’t know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.
  (The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won’t lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)
  What links here?
  - tailcalled's comment on Julian Bradshaw’s Shortform by Julian Bradshaw (3 May 2025 17:56 UTC; 2 points)
- tailcalled 19 Apr 2025 22:18 UTC
  2 points
  0
  Parent
  After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
  I’ve grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it’s much more powerful than individual intelligence (whether natural or artificial).
  Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn’t meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).
  Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution’s information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.
  (Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, … . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there’s also often subniches.)
  And then obviously beyond these points, individual intelligence and evolution focus on different things—what’s happening recently vs what’s happened deep in the past. Neither are perfect; society has changed a lot, which renders what’s happened deep in the past less relevant than it could have been, but at the same time what’s happening recently (I argue) intrinsically struggles with rare, powerful factors.
  If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
  If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
  Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don’t have any good way of knowing which of these are the important ones or not.
  You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)
  The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate “small-scale” understanding (like an autoregressive convolutional model to predict next time given previous time) into “large-scale” understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I’ve studied a bunch of different approaches for that, and ultimately it doesn’t really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)
  If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
  Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
  First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn’t develop.
  Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don’t want money tied up into durability or strength that you’re not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent—and as a consequence, those people would then gain more agency.)
  Also, I do get the impression you are overestimating the feasibility of ““durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern”. I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it’s relatively far from falling naturally out of the methods.
  One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.
  (I should maybe write more but it’s past midnight and also I guess I wonder how you’d respond to this.)