Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Update: Moffitt Library is closed for seismic work, but most other libraries are open. Learn more.

Text Mining & Computational Text Analysis

Linguistic Corpora

BYU Corpus Data

English language corpora from BYU

UC Berkeley has licensed access to the full-text corpus data for the following BYU English language collections. You can search these corpora online without accessing the full-text data:

Full-text corpus data

The full-text corpus data for COCA, COHA and GloWbE are each available.

 

Note that each dataset is available in three different formats: Database, Word/lemma/PoS, and Linear text.
For more information about the data formats see corpus.byu.edu.

NOW: Corpus of News on the Web