Adam Shai comments on Alexander Gietelink Oldenziel’s Shortform

Adam Shai 29 Dec 2024 21:57 UTC
4 points
0
I suppose it depends on what one wants to do with their “understanding” of the system? Here’s one AI safety case I worry about: if we (humans) don’t understand the lower-level ontology that gives rise to the phenomenon that we are more directly interested in (in this case I think thats something like an AI systems behavior/internal “mental” states—your “structurally what”, if I’m understanding correctly, which to be honest I’m not very confident I am), then a sufficiently intelligent AI system that does understand that relationship will be able to exploit the extra degrees of freedom in the lower level ontology to our disadvantage, and we won’t be able to see it coming.

I very much agree that structurally what matters a lot, but that seems like half the battle to me.
- TsviBT 29 Dec 2024 22:06 UTC
  12 points
  2
  Parent
  
  I very much agree that structurally what matters a lot, but that seems like half the battle to me.
  
  But somehow this topic is not afforded much care or interest. Some people will pay lip service to caring, others will deny that mental states exist, but either way the field of alignment doesn’t put much force (money, smart young/new people, social support) toward these questions. This is understandable, as we have much less legible traction on this topic, but that’s… undignified, I guess is the expression.
- TsviBT 29 Dec 2024 22:02 UTC
  2 points
  0
  Parent
  
  a sufficiently intelligent AI system that does understand that relationship will be able to exploit the extra degrees of freedom in the lower level ontology to our disadvantage, and we won’t be able to see it coming.
  
  Even if you do understand the lower level, you couldn’t stop such an adversarial AI from exploiting it, or exploiting something else, and taking control. If you understand the mental states (yeah, the structure), then maybe you can figure out how to make an AI that wants to not do that. In other words, it’s not sufficient, and probably not necessary / not a priority.