Harnessing Context for Vandalism Detection in Wikipedia

Lakshmish Ramaswamy; Raga Sowmya Tummalapenta; Deepika Sethi; Kang Li; Calton Pu

doi:10.4108/cc.1.1.e7

EAI Endorsed Transactions on Collaborative Computing (May 2014)

Harnessing Context for Vandalism Detection in Wikipedia

Lakshmish Ramaswamy,
Raga Sowmya Tummalapenta,
Deepika Sethi,
Kang Li,
Calton Pu

Affiliations

Lakshmish Ramaswamy: Computer Science Department, The University of Georgia, Athens, GA 30602, USA; [email protected]
Raga Sowmya Tummalapenta: Computer Science Department, The University of Georgia, Athens, GA 30602, USA
Deepika Sethi: Computer Science Department, The University of Georgia, Athens, GA 30602, USA
Kang Li: Computer Science Department, The University of Georgia, Athens, GA 30602, USA
Calton Pu: College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA

DOI: https://doi.org/10.4108/cc.1.1.e7
Journal volume & issue: Vol. 1, no. 1
pp. 1 – 14

Abstract

Read online

The importance of collaborative social media (CSM) applications such as Wikipedia to modern free societies can hardly be overemphasized. By allowing end users to freely create and edit content, Wikipedia has greatly facilitated democratization of information. However, over the past several years, Wikipedia has also become susceptible to vandalism, which has adversely affected its information quality. Traditional vandalism detection techniques that rely upon simple textual features such as spammy or abusive words have not been very effective in combating sophisticated vandal attacks that do not contain common vandalism markers. In this paper, we propose a context-based vandalism detection framework for Wikipedia. We first propose a contextenhanced finite state model for representing the context evolution ofWikipedia articles. This paper identifies two distinct types of context that are potentially valuable for vandalism detection, namely content-context and contributor-context. The distinguishing powers of these contexts are discussed by providing empirical results. We design two novel metrics for measuring how well the content-context of an incoming edit fits into the topic and the existing content of a Wikipedia article. We outline machine learning-based vandalism identification schemes that utilize these metrics. Our experiments indicate that utilizing context can substantially improve vandalism detection accuracy.

Published in EAI Endorsed Transactions on Collaborative Computing

ISSN: 2312-8623 (Online)
Publisher: European Alliance for Innovation (EAI)
Country of publisher: Belgium
LCC subjects: Technology
Website: https://eudl.eu/journal/cc

About the journal

Abstract

Keywords