Knowledge-Based Approach to Detect Potentially Risky Websites

Juan Carlos Prieto; Alberto Fernandez-Isabel; Isaac Martin De Diego; Felipe Ortega; Javier M. Moguerza

doi:10.1109/ACCESS.2021.3051374

IEEE Access (Jan 2021)

Knowledge-Based Approach to Detect Potentially Risky Websites

Juan Carlos Prieto,
Alberto Fernandez-Isabel,
Isaac Martin De Diego,
Felipe Ortega,
Javier M. Moguerza

Affiliations

Juan Carlos Prieto: ORCiD; Data Science Laboratory, Rey Juan Carlos University, Móstoles, Spain
Alberto Fernandez-Isabel: ORCiD; Data Science Laboratory, Rey Juan Carlos University, Móstoles, Spain
Isaac Martin De Diego: ORCiD; Data Science Laboratory, Rey Juan Carlos University, Móstoles, Spain
Felipe Ortega: ORCiD; Data Science Laboratory, Rey Juan Carlos University, Móstoles, Spain
Javier M. Moguerza: ORCiD; Data Science Laboratory, Rey Juan Carlos University, Móstoles, Spain

DOI: https://doi.org/10.1109/ACCESS.2021.3051374
Journal volume & issue: Vol. 9
pp. 11633 – 11643

Abstract

Read online

Nowadays, fraudulent and malicious websites are emerging as a harmful and very common problem on the Internet. It causes huge money losses and irreparable damage for both companies and particulars. To face this situation, governments have approved multiple law projects. This way, the legality on the Internet is being enforced and sanctions to those offenders who develop illegal or malicious activities are being imposed. However, governments still need a way to simplify the classification of websites into risky or non-risky, since most of this work is manual. This paper presents the DOmains Classifier based on RIsky Websites (DOCRIW) framework to detect domains that contain possible fraud or malicious content. It is based on two main components. The first component is a previously built knowledge base containing information from risky websites. The second one complements the system with a binary classifier able to label a website (as risky or not) considering just its domain. The system makes use of web information sources and includes host-based variables. It also applies similarity measures, supervised learning algorithms and optimization methods to enhance its performance. The presented work is experimental, rendering promising outcomes.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords