Skip to main content

BYU Corpus Data: Home

English language corpora from BYU

UC Berkeley has licensed access to the full-text corpus data for the following BYU English language collections. You can search these corpora online without accessing the full-text data:

Full-text corpus data

The full-text corpus data for COCA, COHA and GloWbE are each available through a Library/D-Lab partnership:

 

Note that each dataset is available in three different formats: Database, Word/lemma/PoS, and Linear text.
For more information about the data formats see corpus.byu.edu.

See also:

For help

tdm-access@berkeley.edu
Send questions about text and data mining access to library resources to this shared email above, which brings together librarians and campus partners with subject, copyright, technical, and licensing expertise. 

  • For help with text mining tools and software, check out the D-Lab.
Copyright © 2014-2016 The Regents of the University of California. All rights reserved. Except where otherwise noted, this work is subject to a Creative Commons Attribution-Noncommercial 4.0 License.