About earthLings

The earthLings project is building a global and multi-lingual computational linguistic atlas. A linguistic atlas involves mapping both languages (who uses what language where) and dialects (who uses which variants where). A computational linguistic atlas does this with an automated and reproducible analysis of very large digital corpora (~423 billion words from the web and ~14 billion words from social media). While traditional approaches to language and dialect mapping require asking people about their language behaviours, a computational approach observes and models language behaviours on a large scale. This is a global atlas because it covers most of the countries in the world across 10k cities. This is a multi-lingual atlas because it includes 464 different languages. This is a comprehensive atlas because it models variation across entire grammars while previous atlases were restricted to a few dozen pre-selected features.

Intro Presentation (PDF)


