Simon Goldstein comments on AI welfare research needs basic science

Simon Goldstein 2 Jul 2026 13:44 UTC
3 points
0
Cool post. Cameron and I engage with these kinds of considerations extensively in our book on AI welfare (https://philpapers.org/rec/GOLAWA-2). First, we argue that there are lots of ways to make philosophical progress on the “specificity problem” (how to specify the functional roles), through both thought experiments and thinking about evolutionary functions. Second, we argue that more bottom up or indicator based methodologies don’t actually avoid any of the problems related to specificity, and don’t avoid the problem of figuring out which theory of moral status or of consciousness is correct (section 6.3). In particular, any evidence you get from the bottom up approach still needs to ultimately connect to the moral status of AIs via some kind of bayesian updating. And when you do this, you’ll need to think about how likely each theory of moral status is, and then how likely AIs are to have status conditional on your bottom up evidence and on that theory of moral status. In this way, there’s no substitute for grappling with questions about which theory of moral status is correct, and once you do that, you’re basically doing what you call “top down” methodology. In this way, I think that denying top down methodology ends up being incompatible with bayesian epistemology.
- OscarGilg 3 Jul 2026 9:29 UTC
  3 points
  0
  Parent
  Thanks for the comment! I read 6.3. I definitely agree that we want to do Bayesian updating. I totally agree with:
  And when you do this, you’ll need to think about how likely each theory of moral status is, and then how likely AIs are to have status conditional on your bottom up evidence and on that theory of moral status. In this way, there’s no substitute for grappling with questions about which theory of moral status is correct
  However I disagree with this:
  “and once you do that, you’re basically doing what you call ‘top down’ methodology”
  It could be that saying “top-down” makes it sound stronger than we intend. We’re mainly making the case for far more “upwards” updating than past approaches. My view is roughly this:
  - My priors on theories of moral status/consciousness and their indicators are low. In particular it seems like Ai systems are too out-of-distribution for them (§1.2). Also i think there are complications with applying them (§1.1)
  - This changes how my Bayesian updates are likely to flow. The updates flow disproportionately “upwards” into revising theories compared to past approaches. Doing exploratory basic science is more useful under this scheme.
  Maybe this is equivalent to what you mean by a top-down approach but with ~50% prior that all theories are wrong.
  - Simon Goldstein 4 Jul 2026 2:07 UTC
    1 point
    0
    Parent
    Yeah that sounds reasonable. I think my main disagreement is that I haven’t seen good examples of updating “upwards” into revising theories based on the basic science stuff. Might be productive to think through some of the potential examples. For each putative example, I’m going to imagine a human that had the relevant structure, and then assess whether the proposed revision of the theory would have insane consequences for humans. I predict that it will.