I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening” where “understanding” is something like “my measurements track the thing in a one to one lock-step with reality because I know the right typings and I’ve isolated the underlying causes well enough.”
AI control doesn’t seem like it’s making progress on that goal, which is certainly not to say it’s not important—it seems good to me to be putting some attention on locally useful things. Whereas the natural abstractions agenda does feel like progress on that front.
As an aside: I dislike basically all words about scientific progress at this point. I don’t feel like they’re precise enough and it seems easy to get satiated on them and lose track of what’s actually important which is, imo, absolute progress on the problem of understanding what the fuck is going on with minds. Calling this sort of work “science” risks lumping it in with every activity that happens in e.g., academia, and that isn’t right. Calling it “pre-paradigmatic” risks people writing it off as “Okay so people just sit around being confused for years? How could that be good?”
I wish we had better ways of talking about it. I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful, not only for people pursuing it, but for others to even have a sense of what it might mean for field founding science to help in solving alignment. As it is, it seems to often get rounded off to “armchair philosophy” or “just being sort of perpetually confused” which seems bad.
I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening”
One model that I’m currently holding is that Kuhnian paradigms are about how groups of people collectively decide that scientific work is good, which is distinct from how individual scientists do or should decide that scientific work is good. And collective agreement is way more easily reached via external criteria.
Which is to say, problems are what establishes a paradigm. It’s way easier to get a group of people to agree that “thing no go”, than it is to get them to agree on the inherent nature of thing-ness and go-ness. And when someone finally makes thing go, everyone looks around and kinda has to concede that, whatever their opinion was of that person’s ontology, they sure did make thing go. (And then I think the Wentworth/Greenblatt discussion above is about whether the method used to make thing go will be useful for making other things go, which is indeed required for actually establishing a new paradigm.)
That said, I think that the way that an individual scientist decides what ideas to pursue should usually route though things more like “is this getting me closer to understanding what’s happening”, but that external people are going to track “are problems getting solved”, and so it’s probably a good idea for most of the individual scientists to occasionally reflect on how likely their ideas are to make progress on (paradigm-setting) problems.
(It is possible for the agreed-upon problem to be “everyone is confused”, and possible for a new idea to simultaneously de-confused everyone, thus inducing a new paradigm. (You could say that this is what happened with the Church-Turing thesis.) But it’s just pretty uncommon, because people’s ontologies can be wildly different.)
When you say, “I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful...”, how compatible is that with more precisely articulating problems in agent foundations (whose solutions would be externally verifiable by most agent foundations researchers)?
I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening” where “understanding” is something like “my measurements track the thing in a one to one lock-step with reality because I know the right typings and I’ve isolated the underlying causes well enough.”
AI control doesn’t seem like it’s making progress on that goal, which is certainly not to say it’s not important—it seems good to me to be putting some attention on locally useful things. Whereas the natural abstractions agenda does feel like progress on that front.
As an aside: I dislike basically all words about scientific progress at this point. I don’t feel like they’re precise enough and it seems easy to get satiated on them and lose track of what’s actually important which is, imo, absolute progress on the problem of understanding what the fuck is going on with minds. Calling this sort of work “science” risks lumping it in with every activity that happens in e.g., academia, and that isn’t right. Calling it “pre-paradigmatic” risks people writing it off as “Okay so people just sit around being confused for years? How could that be good?”
I wish we had better ways of talking about it. I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful, not only for people pursuing it, but for others to even have a sense of what it might mean for field founding science to help in solving alignment. As it is, it seems to often get rounded off to “armchair philosophy” or “just being sort of perpetually confused” which seems bad.
One model that I’m currently holding is that Kuhnian paradigms are about how groups of people collectively decide that scientific work is good, which is distinct from how individual scientists do or should decide that scientific work is good. And collective agreement is way more easily reached via external criteria.
Which is to say, problems are what establishes a paradigm. It’s way easier to get a group of people to agree that “thing no go”, than it is to get them to agree on the inherent nature of thing-ness and go-ness. And when someone finally makes thing go, everyone looks around and kinda has to concede that, whatever their opinion was of that person’s ontology, they sure did make thing go. (And then I think the Wentworth/Greenblatt discussion above is about whether the method used to make thing go will be useful for making other things go, which is indeed required for actually establishing a new paradigm.)
That said, I think that the way that an individual scientist decides what ideas to pursue should usually route though things more like “is this getting me closer to understanding what’s happening”, but that external people are going to track “are problems getting solved”, and so it’s probably a good idea for most of the individual scientists to occasionally reflect on how likely their ideas are to make progress on (paradigm-setting) problems.
(It is possible for the agreed-upon problem to be “everyone is confused”, and possible for a new idea to simultaneously de-confused everyone, thus inducing a new paradigm. (You could say that this is what happened with the Church-Turing thesis.) But it’s just pretty uncommon, because people’s ontologies can be wildly different.)
When you say, “I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful...”, how compatible is that with more precisely articulating problems in agent foundations (whose solutions would be externally verifiable by most agent foundations researchers)?