Download the Corpus of Global Language Use
(CGLU v4.2)

Organized By Region and Country


Link to Data Size in Words
Africa, North Download Location 1,223,532,842
Africa, Southern Download Location 26,868,810
Africa, Sub Download Location 5,938,870,966
America, Brazil Download Location 2,265,386,107
America, Central Download Location 8,877,634,300
America, North Download Location 51,921,657,887
America, South Download Location 22,441,384,853
Asia, Central Download Location 17,069,517,255
Asia, East Download Location 49,521,933,987
Asia, South Download Location 15,147,872,671
Asia, Southeast Download Location 21,386,781,131
Europe, East Download Location 65,413,609,201
Europe, Russia Download Location 15,363,644,903
Europe, West Download Location 143,748,386,801
Middle East Download Location 1,721,856,657
Oceania Download Location 1,743,571,262
TOTAL 933 gb 443.06 billion words

* v4.2 includes character segmentation for Chinese and Japanese.