Provably Safe Artificial General Intelligence via Interactive Proofs

Kristen Carlson

doi:10.3390/philosophies6040083

Philosophies (Oct 2021)

Provably Safe Artificial General Intelligence via Interactive Proofs

Kristen Carlson

Affiliations

Kristen Carlson: Beth Israel Deaconess Medical Center and Harvard Medical School, Harvard University, Boston, MA 02115, USA

DOI: https://doi.org/10.3390/philosophies6040083
Journal volume & issue: Vol. 6, no. 4
p. 83

Abstract

Read online

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2−100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn ↔ AGIn+1 interaction hazards to an acceptably low level.

Published in Philosophies

ISSN: 2409-9287 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Philosophy. Psychology. Religion: Logic; Philosophy. Psychology. Religion: Philosophy (General)
Website: http://www.mdpi.com/journal/philosophies

About the journal

Abstract

Keywords