I also am very concerned about how we are treating AIs; hopefully they are happy about their situation but it seems like a live possibility that they are not or will not be, and this is a brewing moral catastrophe.
Do you think that there’s a way we could test for this in principle, given unlimited compute?
I don’t think compute is the bottleneck for testing for this. Sorting out some of our own philosophical confusions is, plus lots of engineering effort to construct test environments maybe (in which e.g. the AIs can be asked how they feel, in a more rigorous way), plus lots of interpretability work.
Do you think that there’s a way we could test for this in principle, given unlimited compute?
I don’t think compute is the bottleneck for testing for this. Sorting out some of our own philosophical confusions is, plus lots of engineering effort to construct test environments maybe (in which e.g. the AIs can be asked how they feel, in a more rigorous way), plus lots of interpretability work.