Transactions of the Association for Computational Linguistics (Jan 2021)
MasakhaNER: Named Entity Recognition for African Languages
- David Ifeoluwa Adelani,
- Jade Abbott,
- Graham Neubig,
- Daniel D’souza,
- Julia Kreutzer,
- Constantine Lignos,
- Chester Palen-Michel,
- Happy Buzaaba,
- Shruti Rijhwani,
- Sebastian Ruder,
- Stephen Mayhew,
- Israel Abebe Azime,
- Shamsuddeen H. Muhammad,
- Chris Chinenye Emezue,
- Joyce Nakatumba-Nabende,
- Perez Ogayo,
- Aremu Anuoluwapo,
- Catherine Gitau,
- Derguene Mbaye,
- Jesujoba Alabi,
- Seid Muhie Yimam,
- Tajuddeen Rabiu Gwadabe,
- Ignatius Ezeani,
- Rubungo Andre Niyongabo,
- Jonathan Mukiibi,
- Verrah Otiende,
- Iroro Orife,
- Davis David,
- Samba Ngom,
- Tosin Adewumi,
- Paul Rayson,
- Mofetoluwa Adeyemi,
- Gerald Muriuki,
- Emmanuel Anebi,
- Chiamaka Chukwuneke,
- Nkiruka Odu,
- Eric Peter Wairagala,
- Samuel Oyerinde,
- Clemencia Siro,
- Tobius Saul Bateesa,
- Temilola Oloyede,
- Yvonne Wambui,
- Victor Akinode,
- Deborah Nabagereka,
- Maurice Katusiime,
- Ayodele Awokoya,
- Mouhamadane MBOUP,
- Dibora Gebreyohannes,
- Henok Tilaye,
- Kelechi Nwaike,
- Degaga Wolde,
- Abdoulaye Faye,
- Blessing Sibanda,
- Orevaoghene Ahia,
- Bonaventure F. P. Dossou,
- Kelechi Ogueji,
- Thierno Ibrahima DIOP,
- Abdoulaye Diallo,
- Adewale Akinfaderin,
- Tendai Marengereke,
- Salomey Osei
Affiliations
- David Ifeoluwa Adelani
- Spoken Language Systems Group (LSV), Saarland University, Germany
- Jade Abbott
- Retro Rabbit, South Africa
- Graham Neubig
- Language Technologies Institute, Carnegie Mellon University, United States
- Daniel D’souza
- ProQuest, United States
- Julia Kreutzer
- Google Research, Canada
- Constantine Lignos
- Brandeis University, United States
- Chester Palen-Michel
- Brandeis University, United States
- Happy Buzaaba
- Graduate School of Systems and Information Engineering, University of Tsukuba, Japan
- Shruti Rijhwani
- Language Technologies Institute, Carnegie Mellon University, United States
- Sebastian Ruder
- DeepMind, United Kingdom
- Stephen Mayhew
- Duolingo, United States
- Israel Abebe Azime
- African Institute for Mathematical Sciences (AIMS-AMMI), Ethiopia
- Shamsuddeen H. Muhammad
- University of Porto, Nigeria
- Chris Chinenye Emezue
- Technical University of Munich, Germany
- Joyce Nakatumba-Nabende
- Makerere University, Kampala, Uganda
- Perez Ogayo
- African Leadership University, Rwanda
- Aremu Anuoluwapo
- University of Lagos, Nigeria
- Catherine Gitau
- Masakhane NLP
- Derguene Mbaye
- Masakhane NLP
- Jesujoba Alabi
- Max Planck Institute for Informatics, Germany
- Seid Muhie Yimam
- LT Group, Universität Hamburg, Germany
- Tajuddeen Rabiu Gwadabe
- University of Chinese Academy of Science, China
- Ignatius Ezeani
- Lancaster University, United Kingdom
- Rubungo Andre Niyongabo
- University of Electronic Science and Technology of China, China
- Jonathan Mukiibi
- Makerere University, Kampala, Uganda
- Verrah Otiende
- United States International University - Africa (USIU-A), Kenya
- Iroro Orife
- Niger-Volta LTI
- Davis David
- Masakhane NLP
- Samba Ngom
- Masakhane NLP
- Tosin Adewumi
- Luleo University of Technology, Sweden
- Paul Rayson
- Lancaster University, United Kingdom
- Mofetoluwa Adeyemi
- Masakhane NLP
- Gerald Muriuki
- Makerere University, Kampala, Uganda
- Emmanuel Anebi
- Masakhane NLP
- Chiamaka Chukwuneke
- Lancaster University, United Kingdom
- Nkiruka Odu
- African University of Science and Technology, Abuja, Nigeria
- Eric Peter Wairagala
- Makerere University, Kampala, Uganda
- Samuel Oyerinde
- Masakhane NLP
- Clemencia Siro
- Masakhane NLP
- Tobius Saul Bateesa
- Makerere University, Kampala, Uganda
- Temilola Oloyede
- Masakhane NLP
- Yvonne Wambui
- Masakhane NLP
- Victor Akinode
- Masakhane NLP
- Deborah Nabagereka
- Makerere University, Kampala, Uganda
- Maurice Katusiime
- Makerere University, Kampala, Uganda
- Ayodele Awokoya
- University of Ibadan, Nigeria
- Mouhamadane MBOUP
- Masakhane NLP
- Dibora Gebreyohannes
- Masakhane NLP
- Henok Tilaye
- Masakhane NLP
- Kelechi Nwaike
- Masakhane NLP
- Degaga Wolde
- Masakhane NLP
- Abdoulaye Faye
- Masakhane NLP
- Blessing Sibanda
- Namibia University of Science and Technology, Namibia
- Orevaoghene Ahia
- Instadeep, Nigeria
- Bonaventure F. P. Dossou
- Jacobs University Bremen, Germany
- Kelechi Ogueji
- University of Waterloo, Canada
- Thierno Ibrahima DIOP
- Masakhane NLP
- Abdoulaye Diallo
- Masakhane NLP
- Adewale Akinfaderin
- Masakhane NLP
- Tendai Marengereke
- Masakhane NLP
- Salomey Osei
- African Institute for Mathematical Sciences (AIMS-AMMI), Ethiopia
- DOI
- https://doi.org/10.1162/tacl_a_00416
- Journal volume & issue
-
Vol. 9
pp. 1116 – 1131
Abstract
AbstractWe take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1