EAI Endorsed Transactions on Creative Technologies (Dec 2021)
On Trusting a Cyber Librarian: How rethinking underlying data storage infrastructure can mitigate risks of automation
Abstract
INTRODUCTION: The increased ability of Artificial Intelligence (AI) technologies to generate and parse texts will inevitably lead to more proposals for AI’s use in the semantic sentiment analysis (SSA) of textual sources. We argue that instead of focusing solely on debating the merits of automated versus manual processing and analysis of texts, it is critical to also rethink our underlying storage and representation formats. Specifically, we argue that accommodating multivariate metadata is an example of how underlying data storage infrastructure can reshape the ethical debate surrounding the use of such algorithms. In other words, a system that employs automated analysis may typically require manual intervention to assess the quality of its output, or demand that we select between multiple competing NLP algorithms. Settling on whichever algorithm or ensemble can produce the best results, this is a decision that need not be made a priori at all.OBJECTIVES: An underlying storage and representation system that allows for the existence and evaluation of multiple variants of the same source data, while maintaining attribution to the individual sources of each variant, would be an example of a much-needed enhancement to existing storage technologies, especially in anticipation of the proliferation of AI semantic analysis technologies.METHODS: To this end, we take the view of AI in SSA as a sociotechnical system, and describe a possible novel solution that would allow for safer cyber curation. This can be done by allowing multiple different annotations to coexist within a single publishing ecosystem (whether those different annotations are the result of competing algorithmic models, or varying degrees of human intervention).RESULTS: We discuss the feasibility of such a scheme, using our own infrastructure model (MultiVerse) as an illustrative model for such a system, and analyse the ethical implications.CONCLUSION: Considering an underlying storage and representation system that allows for the existence and evaluation of multiple variants of the same source data, while maintaining attribution to the individual sources of each variant within a single publishing ecosystem helps mitigate risks of automation and enhances AI (semantic) explainability.
Keywords