IEEE Access (Jan 2024)
PatentGrapher: A PLM-GNNs Hybrid Model for Comprehensive Patent Plagiarism Detection Across Full Claim Texts
Abstract
The China National Intellectual Property Administration (CNIPA) receives a tremendous volume of patent applications every year, with the detection of plagiarized patents entailing substantial costs. To overcome the limitations of existing models in handling the extended textual features of patent claim documents, this research presents the PatentGraph model, which integrates Pretrained Language Model (PLM) with Graph Neural Networks (GNNs). It is specifically designed to perform graph modeling based on the hierarchical citation structure of patent claims. The proposed model introduced a local-to-global strategy to extract full-text features for comprehensively understanding the claim text. By employing a Siamese Neural Network, PatentGraph calculates the similarity between patent pairs and gets precise plagiarism statuses. We collected 10,317 authorized Chinese patents from public database and constructed plagiarism sample sets by state-of-art AI tools and randomly rearranging the sentence order from similar authorized patents. The experiment results indicate that the PatentGraph surpasses baseline models across all evaluation metrics, with accuracy and F1 Score enhancements of 8.0% and 8.3%, respectively. The proposed PatentGraph furnishes a fresh perspective on automating the detection of patent plagiarism, and indicates the potential for enhanced efficiency and precision in the patent examination process.
Keywords