Resources

LID Models

pacificLID models: https://www.dropbox.com/sh/od1cfflcbqx5gfw/AACNhqvGmUqo6f4h9ujMYmVua?dl=0

idNet models: https://www.dropbox.com/sh/tr2xmusyp2u47yy/AABOkOXFKVfW77HG0-H-vVHAa?dl=0


Corpora

CGLU v4.2: The Corpus of Global Language Use (423 billion words)

http://www.earthlings.io/download_cglu.html

GeoWAC v1: Geographically-balanced Gigaword Corpora (45 billion words)

http://www.earthlings.io/download_geowac.html

CGLU v3: The Corpus of Global Language Use (16 billion words)

https://publicdata.canterbury.ac.nz/Research/NZILBB/jonathandunn/CGLU_v3/