Skip to Main Content

Linguistics Collection: Linguistic Corpora

Linguistic Corpora

BYU Corpus Data

English language corpora from BYU

UC Berkeley has licensed access to the full-text corpus data for the following BYU English language collections. You can search these corpora online without accessing the full-text data:

Full-text corpus data

The full-text corpus data for COCA, COHA and GloWbE are each available.


Note that each dataset is available in three different formats: Database, Word/lemma/PoS, and Linear text.
For more information about the data formats see

NOW: Corpus of News on the Web