Research Associate, University of Pretoria | Co-Founder, HausaNLP & ArewaDS | Member, MasaKhaneNLP
        African languages just hit their highest-ever representation on the web!!!

The latest Common Crawl (Jan 2026) shows African languages at 0.057% of all crawled pages, an all-time record. That’s 18.5% higher than the previous peak and 343,000+ more pages than the last crawl.

Some standout growth in a single month: → Igbo: +124% → Sango: +259% → Tswana: +279% → Swahili: +45% (now at 294K pages)

For context, English sits at 42% — roughly 728x more pages than all 29 detected African languages combined. There’s still a massive gap, but the direction is right. We believe projects like AfriCC are contributing to this shift by actively increasing the volume and diversity of African language content available for web crawlers.

The full data is open — Common Crawl publishes language stats for every monthly crawl: https://lnkd.in/dE-WAUeZ

What’s your take — what else can we do to close this gap?

#AfriCC #AfricanLanguages #NLP #CommonCrawl #DigitalInclusion #OpenData