David Scott Krueger (formerly: capybaralet) comments on Response to Katja Grace’s AI x-risk counterarguments

David Scott Krueger (formerly: capybaralet) 23 Oct 2022 23:05 UTC
LW: 3 AF: 2
0
AF
Responding in order:
1) yeah I wasn’t saying it’s what her post is about. But I think you can get two more interesting cruxy stuff by interpreting it that way.
2) yep it’s just a caveat I mentioned for completeness.
3) Your spontaneous reasoning doesn’t say that we/it get(/s) good enough at getting it to output things humans approve of before it kills us. Also, I think we’re already at “we can’t tell if the model is aligned or not”, but this won’t stop deployment. I think the default situation isn’t that we can tell if things are going wrong, but people won’t be careful enough even given that, so maybe it’s just a difference of perspective or something… hmm....…