TL; DR. “Shit Happens.” — Ensuring favorable human outcomes for ASI typically involves control frameworks, containment protocols, or alignment mechanisms. Of course, if one stops applying the framework, protocol, or mechanism, one stops obtaining the protections it affords. And history shows that no matter how carefully one arranges the application of such systems, lapses will occur. Thus, valid approaches must be robust to the occasional lapse in their execution for their guarantees to be trusted. At the end, we discuss exceptions to this rule.
Methods for ensuring favorable human outcomes from ASI typically involves control frameworks, containment protocols, or alignment mechanisms that enforce these human-centered objectives.
It goes without saying that if one stops applying the framework, protocol, or mechanism, one stops obtaining the protections it affords.
History shows that no matter how carefully one arranges the application of such systems, lapses will occur.
Thus, valid approaches must be robust to the occasional lapse in their execution for their guarantees to have any meaning.
Generally, this excludes “clockwork”-style approaches, where even a single lapse in execution could potentially invalidate the entire system.
This is an unfortunate limitation as it severely degrades the safety afforded by a wide range of approaches that have been proposed for AI safety.
ASSUMED FAILURE MODES
- Execution will occur w/o out control SW applied
- Protocols will be violated in some cases
- Alignment protocols will fail (a) to be applied, (b) to work as intended, (c) to work in the humanities’ interest.
Robust systems must be resistant to these expected failure modes.
EXCEPTIONS
Often, AI safety comes down to ensuring that an unbounded sequence of events all work according to some desired constraint. This is problematic; sooner or later, one event will not work out as planned. If that event involves an ASI that might be all that is needed.
- The events are large in time or space. So the event
- The system is robust up to some maximum failure rate
KNOWLEDGE SPREADS -