Skip to Main Content

Linguistics Collection: Linguistic Corpora

Linguistic Corpora

English-Corpora.org Corpus Data

English language corpora from English-Corpora.org

UC Berkeley has licensed access to the full-text corpus data for the following English-Corpora.org English language collections. You can search these corpora online without accessing the full-text data:

Full-text corpus data

The full-text corpus data for COCA, COHA and GloWbE are each available.

 

Note that each dataset is available in three different formats: Database, Word/lemma/PoS, and Linear text.
For more information about the data formats see www.english-corpora.org/corpora.asp

NOW: Corpus of News on the Web