Download the Corpus of Global Language Use
(CGLU v4.2)
Organized By Region and Country
Link to Data | Size in Words | |
Africa, North | Download Location | 1,223,532,842 |
Africa, Southern | Download Location | 26,868,810 |
Africa, Sub | Download Location | 5,938,870,966 |
America, Brazil | Download Location | 2,265,386,107 |
America, Central | Download Location | 8,877,634,300 |
America, North | Download Location | 51,921,657,887 |
America, South | Download Location | 22,441,384,853 |
Asia, Central | Download Location | 17,069,517,255 |
Asia, East | Download Location | 49,521,933,987 |
Asia, South | Download Location | 15,147,872,671 |
Asia, Southeast | Download Location | 21,386,781,131 |
Europe, East | Download Location | 65,413,609,201 |
Europe, Russia | Download Location | 15,363,644,903 |
Europe, West | Download Location | 143,748,386,801 |
Middle East | Download Location | 1,721,856,657 |
Oceania | Download Location | 1,743,571,262 |
TOTAL | 933 gb | 443.06 billion words |
* v4.2 includes character segmentation for Chinese and Japanese.