I have fewer thoughts about environmental counterfactuals but think a similar approach could be used to make statements along those lines, i.e. construct alternate agents receiving a different observation about the world. I’m not sure any very specific technical problem exists with that, though—the TDT paper already talks about world model surgery.
I added a comment on the post directly, but I will add: we seem to roughly agree on counterfactuals existing in the imagination in a broad sense (I highlighted two ways this can go above—with counterfactuals being an intrinsic part of how we interact with the world or a pragmatic response to navigating the world). However, I think that following this through and asking why we care about them if they’re just in our imagination ends up taking us down a path where counterfactuals being circular seems plausible. On the other hand, you seem to think that this path takes us somewhere where there isn’t any circularity. Anyway, that’s the difference in our positions as far as I can tell from having just skimmed your link.
I was attempting to solve a relatively specific technical problem related to self-proofs using counterfactuals. So I suppose I do think (at least non-circular ones) are useful. But I’m not sure I’d commit to any broader philosophical statement about counterfactuals beyond “they can be used in a specific formal way to help functions prove statements about their own output in a way that avoid Lob’s Theorem issues”. That being said, that’s a pretty good use, if that’s the type of thing you want to do? It’s also not totally clear if you’re imagining counterfactuals the same way I am. I am using the English term because it matches the specific thing I’m describing decently well, but the term has a broad meaning, and without having an extremely specific imagining, it’s hard to make any more statements about what can be done with them.
So, this post only deals with agent counterfactuals (not environmental counterfactuals), but I believe I have solved the technical issue you mention about the construction of logical counterfactuals as it concerns TDT. See: https://www.alignmentforum.org/posts/TnkDtTAqCGetvLsgr/a-possible-resolution-to-spurious-counterfactuals
I have fewer thoughts about environmental counterfactuals but think a similar approach could be used to make statements along those lines, i.e. construct alternate agents receiving a different observation about the world. I’m not sure any very specific technical problem exists with that, though—the TDT paper already talks about world model surgery.
I added a comment on the post directly, but I will add: we seem to roughly agree on counterfactuals existing in the imagination in a broad sense (I highlighted two ways this can go above—with counterfactuals being an intrinsic part of how we interact with the world or a pragmatic response to navigating the world). However, I think that following this through and asking why we care about them if they’re just in our imagination ends up taking us down a path where counterfactuals being circular seems plausible. On the other hand, you seem to think that this path takes us somewhere where there isn’t any circularity. Anyway, that’s the difference in our positions as far as I can tell from having just skimmed your link.
I was attempting to solve a relatively specific technical problem related to self-proofs using counterfactuals. So I suppose I do think (at least non-circular ones) are useful. But I’m not sure I’d commit to any broader philosophical statement about counterfactuals beyond “they can be used in a specific formal way to help functions prove statements about their own output in a way that avoid Lob’s Theorem issues”. That being said, that’s a pretty good use, if that’s the type of thing you want to do? It’s also not totally clear if you’re imagining counterfactuals the same way I am. I am using the English term because it matches the specific thing I’m describing decently well, but the term has a broad meaning, and without having an extremely specific imagining, it’s hard to make any more statements about what can be done with them.