IEEE Access (Jan 2019)
A Graph Based Clustering Approach for Relation Extraction From Crime Data
Abstract
Application of natural language processing techniques based on crime data can prove to be beneficial in several processes of the criminal justice industry. The availability of massive crime reports helps law enforcement agencies when a criminal investigation is launched. While investigating a crime, questions like what type of crime, who committed the crime, what happened at which place, on what time, and what actions are taken, keep arising. Now, it is not feasible for the law enforcement agencies to get into the detail of these available massive crime reports and get the answers. To tackle these problems associated with criminal justice industry, the proposed work considers a textual corpus containing information of crime against women in India and extracts substantial relations between the named entities present in the corpus by a hierarchical graph-based clustering technique. For extracting the relations, different types of entity pairs have been chosen and similarities among them have been measured based on the intermediate context words. Depending on the similarity score, a weighted graph has been formed and a similarity threshold is set to partition the graph based on the edge weights. With the iterative application of the clustering algorithm, all the named entity pairs are grouped into clusters, each of which signifies different crime aspects. Each cluster is characterized using the most frequent context word present in it. The proposed relation extraction scheme helps in crime pattern analysis that can aid in various criminal investigation requirements. The results with optimal cluster validation indices depict the effectiveness of this method.
Keywords