Two Stage Job Title Identification System for Online Job Advertisements

Ibrahim Rahhal; Kathleen M. Carley; Ismail Kassou; Mounir Ghogho

doi:10.1109/ACCESS.2023.3247866

IEEE Access (Jan 2023)

Two Stage Job Title Identification System for Online Job Advertisements

Ibrahim Rahhal,
Kathleen M. Carley,
Ismail Kassou,
Mounir Ghogho

Affiliations

Ibrahim Rahhal: ORCiD; ENSIAS, Mohammed V University, Rabat, Morocco
Kathleen M. Carley: ORCiD; Institute for Software Research, Carnegie Mellon University, Pittsburgh, PA, USA
Ismail Kassou: ENSIAS, Mohammed V University, Rabat, Morocco
Mounir Ghogho: ORCiD; Research Laboratory (TICLab), College of Engineering and Architecture, International University of Rabat, Sale, Morocco

DOI: https://doi.org/10.1109/ACCESS.2023.3247866
Journal volume & issue: Vol. 11
pp. 19073 – 19092

Abstract

Read online

Data science techniques are powerful tools for extracting knowledge from large datasets. Analyzing the job market by classifying online job advertisements (ads) has recently received much attention. Various approaches for multi-label classification (e.g., self-supervised learning and clustering) have been developed to identify the occupation from a job advertisement and have achieved a satisfying performance. However, these approaches require labeled datasets with hundreds of thousands of examples and focus on specific databases such as the Occupational Information Network (O*NET) that are more adapted to the US job market. In this paper, we present a two-stage job title identification methodology to address the case of small datasets. We use Bidirectional Encoder Representations from Transformers (BERT) to first classify the job ads according to their corresponding sector (e.g., Information Technology, Agriculture). Then, we use unsupervised machine learning algorithms and some similarity measures to find the closest matching job title from the list of occupations within the predicted sector. We also propose a novel document embedding strategy to address the issues of processing and classifying job ads. Our experimental results show that the proposed two-stage approach improves the job title identification accuracy by 14% to achieve more than 85% in some sectors. Moreover, we found that incorporating document embedding-based approaches such as weighting strategies and noise removal improves the classification accuracy by 23.5% compared to approaches based on the Bag of words model. Further evaluations verify that the proposed methodology either outperforms or performs at least as well as the state-of-the-art methods. Applying the proposed methodology to Moroccan job market data has helped identify emerging and high-demand occupations in Morocco.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords