Research Associate, University of Pretoria | Co-Founder, HausaNLP & ArewaDS | Member, MasaKhaneNLP
        Introducing our work

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data ArXiv: https://lnkd.in/dBgW6b6P

Some months ago, I invited the community to participate in a hackathon to annotate a language identification dataset, with authorship on the resulting dataset description paper as an incentive. We are deeply grateful to everyone who participated, as well as to the Common Crawl team led by Pedro Ortiz Suarez, my boss Vukosi Marivate, and our collaborators Shamsuddeen H. Muhammad, PhD, and Atnafu Lambebo Tonja.

We hope this resource will inspire a wide range of NLP research and applications, and contribute meaningfully to advancing African NLP.