The difference is people have been trying hard to harness nuclear forces for energy, while people have not been trying hard to research humans for alignment in the same way. Even relative to the size of the alignment field being far smaller, there hasn’t been a real effort as far as I can see. Most people immediately respond with “AGI is different from humans for X,Y,Z reasons” (which are true) and then proceed to throw out the baby with the bathwater by not looking into human value formation at all.
I’m not sure why you would think this. The actual funding that goes into trying to do this is not that large; fusion research funding is maybe like $500 million/yr. The FTX Future Fund alone will probably spend on the order of this much money this year, for instance. Most of these proposals are aimed at one very specific way of trying to exploit the binding energy (turn hydrogen isotopes into helium and other heavier elements) and don’t consider alternatives.
I think this approach is basically correct because I don’t see any plausible alternative that anyone has come up with. Fusion is promising because we know it happens in nature, we can trigger it under extreme conditions, and there’s an obvious mechanical explanation for why it would work. The only challenge is an engineering one, of doing it in a controlled way.
If “humans are an untapped source of evidence for alignment” or any similar claim is going to have teeth, it needs to be coupled with a more concrete strategy about how we should go about extracting this evidence, and I’m not sure where I’m supposed to get this from the post. I would be highly surprised if anyone said “evidence from humans is irrelevant to alignment”; I think the actual reason people don’t go down this path is because they don’t think it’s promising, much like the people who don’t spend billions of dollars exploring possibilities of cold fusion.
Planes don’t fly like birds, but we sure as hell studied birds to make them.
I don’t think this is actually as clear as you might think it is. As far as I can see birds are useful for designing planes in only the most superficial way; that is “you need to be pushing air downwards so you can fly”. Birds do this in a way that is different from helicopters, and planes do it in a way that’s different from both. You don’t need to have seen birds to know this, though, because conservation of momentum gets you to the same conclusion pretty easily.
I don’t really see how any deeper study of birds would have helped you to better design planes. If anything, bird flight is so complicated that studying it could have made it harder to design planes because you’d try to replicate how birds do it instead of solving the problem from first principles, e.g. by trying to figure out which airfoil shapes would deflect an air current downward.
This is their current research direction, The shard theory of human values which they’re currently making posts on.
Sure, but why do they think it’s a promising direction of research? This is what’s not clear to me and this post hasn’t helped make it any clearer, though the shortcoming there at least partly lies with me due to my inability to understand what’s being said.
There’s apparently some controversy over what the Wright brothers were able to infer from studying birds. From Wikipedia:
On the basis of observation, Wilbur concluded that birds changed the angle of the ends of their wings to make their bodies roll right or left.[34] The brothers decided this would also be a good way for a flying machine to turn – to “bank” or “lean” into the turn just like a bird – and just like a person riding a bicycle, an experience with which they were thoroughly familiar. Equally important, they hoped this method would enable recovery when the wind tilted the machine to one side (lateral balance). They puzzled over how to achieve the same effect with man-made wings and eventually discovered wing-warping when Wilbur idly twisted a long inner-tube box at the bicycle shop.[35]
Other aeronautical investigators regarded flight as if it were not so different from surface locomotion, except the surface would be elevated. They thought in terms of a ship’s rudder for steering, while the flying machine remained essentially level in the air, as did a train or an automobile or a ship at the surface. The idea of deliberately leaning, or rolling, to one side seemed either undesirable or did not enter their thinking...
Wilbur claimed they learned about plausible control mechanisms from studying birds. However, ~40 years after their first flight, Orville would go on to contradict that assertion and claim that they weren’t able to draw any useful ideas from birds.
There are many reasons why I think shard theory is a promising line of research. I’ll just list some of them without defending them in any particular depth (that’s what the shard theory sequence is for):
I expect that there’s more convergence in the space of effective learning algorithms than there is in the space of non-learning systems. This is ultimately due to the simplicity prior, which we can apply to the space of learning systems. Those learning algorithms which are best able to generalize are those which are simple, for the same reason that simple hypotheses are more likely to generalize. I thus expect there to be more convergence between the learning dynamics of artificial and natural intelligences than most alignment researchers seem to assume.
The more I think about them, the less weird values seem. They do not look like a hack or a kludge to me, and it seems increasingly likely that a broad class of agentic learning systems will converge to similar values meta-dynamics. I think evolution put essentially zero effort into giving us “unusual” meta-preferences, and that the meta-preferences that we do have are pretty typical in the space of possible learning systems.
To be clear, I’m not saying that AIs will naturally converge to first order human values. I’m saying they’ll have computational structures that have very similar higher-order dynamics to human values, but could be orientated towards completely different things.
I think that imperfect value alignment does not lead to certain doom. I reject the notion of there being any “true” values which exist as ephemeral Platonic ideal inaccessible to our normal introspective process.
Rather, I think we have something like a continuous distribution over possible values which we could instantiate in different circumstances. Much of the felt sense of value fragility arises from a type error of trying to represent a continuous distribution with a finite set of samples from that distribution.
The consequence of this is that it’s possible for two distributions to partially overlap. In contrast, if you think that humans have some finite set of “true” values (which we don’t know), that AIs have some finite set of “true” values (which we can’t control), and that these need to near-perfectly overlap or the AIs will Goodheart away all the future’s value, then the prospects for value alignment would look grim indeed!
I think that inner values are relatively predictable, conditional on knowing the outer optimization criteria and learning environment. Yudkowsky makes frequent reference to how evolution failed to align humans to maximizing inclusive genetic fitness, and that this implies inner values have no predictable relationship with outer optimization criteria. I think it’s a mistake to anchor our expectations of inner / outer value outcomes in AIs to evolution’s outcome in humans. Evidence from human inner / outer value outcomes seems like the much more relevant comparison to me.
Similarly, I don’t think we’ll get a “sharp left turn” from AI training, so I more strongly expect that work on value aligning current AI systems will extend to superhuman AI systems, and that human-like learning dynamics will not totally go out the window once we reach superintelligence.
As far as I can see birds are useful for designing planes in only the most superficial way
“The Wright Brothers spent a great deal of time observing birds in flight. They noticed that birds soared into the wind and that the air flowing over the curved surface of their wings created lift. Birds change the shape of their wings to turn and maneuver. The Wrights believed that they could use this technique to obtain roll control by warping, or changing the shape, of a portion of the wing.” (from a NASA website for kids, but I’ve seen this claim in lots of other places too)
I’ve seen this claim too, but I’ve also seen claims that historically obsession with bird flight was something that slowed down progress into investigations of how to achieve flight. On net I don’t think I make any updates on this evidence, unless I get a compelling account for why bird flight provides us with a particular insight that would have been considerably more difficult to get from another direction.
Edit: I also think the observations the Wrights are said to have made are rather superficial, and could not have been useful for much more than a flash of insight which shows them that a particular way of solving the problem is possible.
What would not be superficial is if they did careful investigations of the shape of bird wings, derived some general model of how much lift an airfoil would generate from that, and then used that model to produce a prototype to start optimizing from. Is there any evidence for this happening?
This sounds implausible to me. I agree it would have taken longer to occur to people, but arguing that at some time people wouldn’t have figured out how to make helicopters or planes seems difficult to believe.
I’m curious why people would think this, though. Why is the possibility of flight and the basic mechanism of “pushing air downward” supposed to be so difficult, either conceptually or as a matter of engineering, that we couldn’t have achieved it without the concrete example of birds and insects?
Why is the possibility of flight and the basic mechanism of “pushing air downward” supposed to be so difficult, either conceptually or as a matter of engineering, that we couldn’t have achieved it without the concrete example of birds and insects?
Because you need evidence to raise a hypothesis (like “heavier-than-air flight”) to consideration, and also social proof / funding to get people to take the ideas seriously. In hindsight the concept is obvious to you, as are the other clues by which other people could obviously have noticed the possibility of flight. That’s not how it feels to be in their place, though, without birds existing to constantly remind them of that possibility.
Out of curiosity, where do you think people got the idea of going to the moon from? By your logic, since we never saw any animal go to the moon, how to do so shouldn’t have been obvious to us and it should have been extremely difficult to secure funding for such a project, no?
I’m not saying that flight wouldn’t have happened at all without birds to look to. I’m saying that I think it would have taken somewhat longer, measured in years—decades.
I think this is plausible, especially if you make the range of “somewhat longer” so big that it encompasses more than an order of magnitude of time, as in years—decades. It’s still not obvious to me, though.
If we didn’t even have the verb “to fly”, and nobody had seen something fly, “going up and travelling sideways while hovering some distance above the ground” would have been a weird niche idea, and people like the Wright Brothers would have probably never even heard of it. It could have easily taken decades longer.
I think people would have noticed feathers, paper, or folded sheets of paper hovering above the ground for long periods of time; people would have been able to flap their arms and feel the upward force and then attach large slabs and test how much the upward force was increased; people would have had time to study the ergonomics of thrown objects. Maybe it would have taken longer, but I think flight still would have been done, in less than a “few decades” later than it took for the wright brothers to figure it out.
Reason+capitalism is surprisingly resilient to setbacks like these.
I strongly disagree with this counterfactual and would happily put up large sums of money if only it were possible to bet on the outcome of some experiment on this basis.
Humans designed lots of systems that have no analog whatsoever in nature. We didn’t need to see objects similar to computers to design computers, for instance. We didn’t even need to see animals that do locomotion using wheels to design the wheel!
It’s just so implausible that people would not have had this idea at the start of the 20th century if people hadn’t seen animals flying. I’m surprised that people actually believe this to be the case.
To be fair, we did have animals that served the purpose of computers. We even called them computers—as in, people whose job it was to do calculations (typically Linear Algebra or Calculus or Differential Equations—hard stuff).
This is true, but if this level of similarity is going to count, I think there are natural “examples” of pretty much anything you could ever hope to build. It doesn’t seem helpful to me when thinking about the counterfactual I brought up.
I’m not sure why you would think this. The actual funding that goes into trying to do this is not that large; fusion research funding is maybe like $500 million/yr. The FTX Future Fund alone will probably spend on the order of this much money this year, for instance. Most of these proposals are aimed at one very specific way of trying to exploit the binding energy (turn hydrogen isotopes into helium and other heavier elements) and don’t consider alternatives.
I think this approach is basically correct because I don’t see any plausible alternative that anyone has come up with. Fusion is promising because we know it happens in nature, we can trigger it under extreme conditions, and there’s an obvious mechanical explanation for why it would work. The only challenge is an engineering one, of doing it in a controlled way.
If “humans are an untapped source of evidence for alignment” or any similar claim is going to have teeth, it needs to be coupled with a more concrete strategy about how we should go about extracting this evidence, and I’m not sure where I’m supposed to get this from the post. I would be highly surprised if anyone said “evidence from humans is irrelevant to alignment”; I think the actual reason people don’t go down this path is because they don’t think it’s promising, much like the people who don’t spend billions of dollars exploring possibilities of cold fusion.
I don’t think this is actually as clear as you might think it is. As far as I can see birds are useful for designing planes in only the most superficial way; that is “you need to be pushing air downwards so you can fly”. Birds do this in a way that is different from helicopters, and planes do it in a way that’s different from both. You don’t need to have seen birds to know this, though, because conservation of momentum gets you to the same conclusion pretty easily.
I don’t really see how any deeper study of birds would have helped you to better design planes. If anything, bird flight is so complicated that studying it could have made it harder to design planes because you’d try to replicate how birds do it instead of solving the problem from first principles, e.g. by trying to figure out which airfoil shapes would deflect an air current downward.
Sure, but why do they think it’s a promising direction of research? This is what’s not clear to me and this post hasn’t helped make it any clearer, though the shortcoming there at least partly lies with me due to my inability to understand what’s being said.
There’s apparently some controversy over what the Wright brothers were able to infer from studying birds. From Wikipedia:
Wilbur claimed they learned about plausible control mechanisms from studying birds. However, ~40 years after their first flight, Orville would go on to contradict that assertion and claim that they weren’t able to draw any useful ideas from birds.
There are many reasons why I think shard theory is a promising line of research. I’ll just list some of them without defending them in any particular depth (that’s what the shard theory sequence is for):
I expect that there’s more convergence in the space of effective learning algorithms than there is in the space of non-learning systems. This is ultimately due to the simplicity prior, which we can apply to the space of learning systems. Those learning algorithms which are best able to generalize are those which are simple, for the same reason that simple hypotheses are more likely to generalize. I thus expect there to be more convergence between the learning dynamics of artificial and natural intelligences than most alignment researchers seem to assume.
The more I think about them, the less weird values seem. They do not look like a hack or a kludge to me, and it seems increasingly likely that a broad class of agentic learning systems will converge to similar values meta-dynamics. I think evolution put essentially zero effort into giving us “unusual” meta-preferences, and that the meta-preferences that we do have are pretty typical in the space of possible learning systems.
To be clear, I’m not saying that AIs will naturally converge to first order human values. I’m saying they’ll have computational structures that have very similar higher-order dynamics to human values, but could be orientated towards completely different things.
I think that imperfect value alignment does not lead to certain doom. I reject the notion of there being any “true” values which exist as ephemeral Platonic ideal inaccessible to our normal introspective process.
Rather, I think we have something like a continuous distribution over possible values which we could instantiate in different circumstances. Much of the felt sense of value fragility arises from a type error of trying to represent a continuous distribution with a finite set of samples from that distribution.
The consequence of this is that it’s possible for two distributions to partially overlap. In contrast, if you think that humans have some finite set of “true” values (which we don’t know), that AIs have some finite set of “true” values (which we can’t control), and that these need to near-perfectly overlap or the AIs will Goodheart away all the future’s value, then the prospects for value alignment would look grim indeed!
I think that inner values are relatively predictable, conditional on knowing the outer optimization criteria and learning environment. Yudkowsky makes frequent reference to how evolution failed to align humans to maximizing inclusive genetic fitness, and that this implies inner values have no predictable relationship with outer optimization criteria. I think it’s a mistake to anchor our expectations of inner / outer value outcomes in AIs to evolution’s outcome in humans. Evidence from human inner / outer value outcomes seems like the much more relevant comparison to me.
Similarly, I don’t think we’ll get a “sharp left turn” from AI training, so I more strongly expect that work on value aligning current AI systems will extend to superhuman AI systems, and that human-like learning dynamics will not totally go out the window once we reach superintelligence.
“Does not”?
Yes. “does not” is what I meant.
“The Wright Brothers spent a great deal of time observing birds in flight. They noticed that birds soared into the wind and that the air flowing over the curved surface of their wings created lift. Birds change the shape of their wings to turn and maneuver. The Wrights believed that they could use this technique to obtain roll control by warping, or changing the shape, of a portion of the wing.” (from a NASA website for kids, but I’ve seen this claim in lots of other places too)
I’ve seen this claim too, but I’ve also seen claims that historically obsession with bird flight was something that slowed down progress into investigations of how to achieve flight. On net I don’t think I make any updates on this evidence, unless I get a compelling account for why bird flight provides us with a particular insight that would have been considerably more difficult to get from another direction.
Edit: I also think the observations the Wrights are said to have made are rather superficial, and could not have been useful for much more than a flash of insight which shows them that a particular way of solving the problem is possible.
What would not be superficial is if they did careful investigations of the shape of bird wings, derived some general model of how much lift an airfoil would generate from that, and then used that model to produce a prototype to start optimizing from. Is there any evidence for this happening?
If birds didn’t exist (& insects etc.), maybe it never would have occurred to people that heavier-than-air flight was possible in the first place.
Hard to know the counterfactual though.
This sounds implausible to me. I agree it would have taken longer to occur to people, but arguing that at some time people wouldn’t have figured out how to make helicopters or planes seems difficult to believe.
I’m curious why people would think this, though. Why is the possibility of flight and the basic mechanism of “pushing air downward” supposed to be so difficult, either conceptually or as a matter of engineering, that we couldn’t have achieved it without the concrete example of birds and insects?
Because you need evidence to raise a hypothesis (like “heavier-than-air flight”) to consideration, and also social proof / funding to get people to take the ideas seriously. In hindsight the concept is obvious to you, as are the other clues by which other people could obviously have noticed the possibility of flight. That’s not how it feels to be in their place, though, without birds existing to constantly remind them of that possibility.
Out of curiosity, where do you think people got the idea of going to the moon from? By your logic, since we never saw any animal go to the moon, how to do so shouldn’t have been obvious to us and it should have been extremely difficult to secure funding for such a project, no?
I’m not saying that flight wouldn’t have happened at all without birds to look to. I’m saying that I think it would have taken somewhat longer, measured in years—decades.
I think this is plausible, especially if you make the range of “somewhat longer” so big that it encompasses more than an order of magnitude of time, as in years—decades. It’s still not obvious to me, though.
If we didn’t even have the verb “to fly”, and nobody had seen something fly, “going up and travelling sideways while hovering some distance above the ground” would have been a weird niche idea, and people like the Wright Brothers would have probably never even heard of it. It could have easily taken decades longer.
I think people would have noticed feathers, paper, or folded sheets of paper hovering above the ground for long periods of time; people would have been able to flap their arms and feel the upward force and then attach large slabs and test how much the upward force was increased; people would have had time to study the ergonomics of thrown objects. Maybe it would have taken longer, but I think flight still would have been done, in less than a “few decades” later than it took for the wright brothers to figure it out.
Reason+capitalism is surprisingly resilient to setbacks like these.
I strongly disagree with this counterfactual and would happily put up large sums of money if only it were possible to bet on the outcome of some experiment on this basis.
Humans designed lots of systems that have no analog whatsoever in nature. We didn’t need to see objects similar to computers to design computers, for instance. We didn’t even need to see animals that do locomotion using wheels to design the wheel!
It’s just so implausible that people would not have had this idea at the start of the 20th century if people hadn’t seen animals flying. I’m surprised that people actually believe this to be the case.
To be fair, we did have animals that served the purpose of computers. We even called them computers—as in, people whose job it was to do calculations (typically Linear Algebra or Calculus or Differential Equations—hard stuff).
This is true, but if this level of similarity is going to count, I think there are natural “examples” of pretty much anything you could ever hope to build. It doesn’t seem helpful to me when thinking about the counterfactual I brought up.