PLoS ONE (Jan 2020)
Text mining of online job advertisements to identify direct discrimination during job hunting process: A case study in Indonesia.
Abstract
Discrimination in the workplace is illegal, yet discriminatory practices remain a persistent global problem. To identify discriminatory practices in the workplace, job advertisement analysis was used by previous studies. However, most of those studies adopted content analysis by manually coding the text from a limited number of samples since working with a large scale of job advertisements consisting of unstructured text data is very challenging. Encountering those limitations, the present study involves text mining techniques to identify multiple types of direct discrimination on a large scale of online job advertisements by designing a method called Direct Discrimination Detection (DDD). The DDD is constructed using a combination of N-grams and regular expressions (regex) with the exact match principle of a Boolean retrieval model. A total of 8,969 online job advertisements in English and Bahasa Indonesia, published from May 2005 to December 2017 were collected from bursakerja-jateng.com as the data. The results reveal that the practices of direct discrimination still exist during the job-hunting process including gender, marital status, physical appearances, and religion. The most recurrent type of discrimination which occurs in job advertisements is based on age (66.27%), followed by gender (38.76%), and physical appearances (18.42%). Additionally, female job seekers are found as the most vulnerable party to experience direct discrimination during recruitment. The results exhibit female job seekers face complex jeopardy in particular job positions comparing to their male counterparts. Not only excluded because of their gender, but female job seekers also had to fulfil more requirements for getting an opportunity to apply for the jobs such as being single, still at a young age, complying specific physical appearances and particular religious preferences. This study illustrates the power and potential of optimizing computational methods on a large scale of unstructured text data to analyze phenomena in the social field.