Journal of Engineering Science and Technology (Oct 2016)
PLAGIARISM DETECTION IN TEXT DOCUMENTS USING SENTENCE BOUNDED STOP WORD N-GRAMS
Abstract
With the evolution of technologies like internet search engines and improved text editors, plagiarism has become a critical issue. Many works are already available in verbatim plagiarism detection which is a type of simple copy and paste plagiarism but when it comes to intelligent plagiarism the scenario becomes more complex. Intelligent plagiarism includes plagiarism through idea adoption, translation and text manipulations which is more challenging to deal with. The paper makes an attempt to detect intelligent plagiarism using the structural information within the document. This is done by the extraction of stop words, in contrast to the other methods that usually rely upon content words. The proposed method enhances this existing idea by including the rough sentence boundaries along with stop word profiles. Further this method is extended using the part of speech tags and finally the system is evaluated using sample documents from PAN- 2010 data set. The results are compared with the baseline approach and performance is evaluated based on standard PAN measures.