The earthLings project is building a global and multi-lingual computational linguistic atlas. A linguistic atlas involves mapping both languages (who uses what language where) and dialects (who uses which variants where). A computational linguistic atlas does this with an automated and reproducible analysis of very large digital corpora (~423 billion words from the web and ~14 billion words from social media). While traditional approaches to language and dialect mapping require asking people about their language behaviours, a computational approach observes and models language behaviours on a large scale. This is a global atlas because it covers most of the countries in the world across 10k cities. This is a multi-lingual atlas because it includes 464 different languages. This is a comprehensive atlas because it models variation across entire grammars while previous atlases were restricted to a few dozen pre-selected features.
Dunn, J. (2019). "Modeling Global Syntactic Variation in English Using Dialect Classification." In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (NAACL 19). doi: 10.18653/v1/W19-1405