International Journal of Information Science and Management (Apr 2024)

Using the Citation-Content-Based Approach to Patent Clustering

  • Narges Neshat,
  • Anahita Kermani

DOI
https://doi.org/10.22034/ijism.2024.1977932.0
Journal volume & issue
Vol. 22, no. 2
pp. 139 – 150

Abstract

Read online

patents are a significant competitive strategy to categorize commercial value based on the source information of technology; researchers use patent analysis as a practical tool to infer various types of information. This shows how important it is to retrieve and access them. Clustering is a method used in different fields to group similar natures. Citations are commonly used to cluster documents, and two methods are widely used for this purpose. The first method uses bibliographic coupling, and the second method identifies the words in the citation titles, also called co-citation. However, it is necessary to investigate which methods provide better patent clustering and retrieval results. This study examines citation contents instead of citations in building relevant groups of patents. Experimental research was done on a set of US patents. The analysis is divided into three phases. The first is appropriate databases to conduct patent searches according to the subject and objective of this study. The basic inventions and the experimental set were selected. Phase II, for developing a patent clustering system based on patent similarities and assisting the relationships among categories, we used fuzzy c-means (FCM) clustering because it can handle overlapping clusters similar to k-means. As fuzzy clustering is a kind of overlapping clustering, extended B Cubed precision and recall - measures for evaluating overlapping clustering - were used. Since patents can belong to multiple technology domains, in phase III, a Perl program was written to manage the matching process. The study involved creating two patent clusters using bibliographic coupling and citation title words, respectively. The results indicated that the bibliographic coupling method produced better clustering performance than the citation title words. Moreover, the cluster structure was more extensive in terms of exhaustivity than the citation title words. It's interesting to note that the use of cited patent title words resulted in a reduction of nearly 40% of the number of attributes. Additionally, when compared to the use of bibliographic coupling, the cited title words method had a nearly equal recall of clustering by cited patents in high exhaustivity. As a result, it appears that using cited title words may be preferable when the high exhaustivity approach is selected for patent clustering and retrieval.

Keywords