Safety engineering for physical systems relies on the known properties of those systems. We may not know what steps the Jeff Bezos AGI (JBAGI) will create his new business empire, but we can probably know something about the properties of the physical resources that would be used to do so, or at minimum, the physical resources required to run JBAGI in the first place.
This suggests that high reliability engineering for safe AGI ought to focus on those known properties. For example, limiting the amount of physical compute, electricity, or other concretely measurable resources consumed by any instance of a model or used in total by that model, in ways that we can apply reliability engineering principles to.
For (simplified) example, while we may not be able to predict how JBAGI will behave in all situations because of the very large action space and lack of interpretability, we can engineer an airgapped killswitch that would cut power to a datacenter housing JBAGI, even if we do not know exactly when we might want use that switch.
The switch is much less complex and much simpler to understand and implement, from a reliability engineering perspective. This would extend the reliability of this control mechanisms to the system we seek to control, since we know with high likelihood when we can turn it off, as well as when we were approaching the conditions where we no longer could.
Safety engineering for physical systems relies on the known properties of those systems. We may not know what steps the Jeff Bezos AGI (JBAGI) will create his new business empire, but we can probably know something about the properties of the physical resources that would be used to do so, or at minimum, the physical resources required to run JBAGI in the first place.
This suggests that high reliability engineering for safe AGI ought to focus on those known properties. For example, limiting the amount of physical compute, electricity, or other concretely measurable resources consumed by any instance of a model or used in total by that model, in ways that we can apply reliability engineering principles to.
For (simplified) example, while we may not be able to predict how JBAGI will behave in all situations because of the very large action space and lack of interpretability, we can engineer an airgapped killswitch that would cut power to a datacenter housing JBAGI, even if we do not know exactly when we might want use that switch.
The switch is much less complex and much simpler to understand and implement, from a reliability engineering perspective. This would extend the reliability of this control mechanisms to the system we seek to control, since we know with high likelihood when we can turn it off, as well as when we were approaching the conditions where we no longer could.