Library Guides: BYU Corpus Data: Home

BYU Corpus Data

English language corpora from BYU

UC Berkeley has licensed access to the full-text corpus data for the following BYU English language collections. You can search these corpora online without accessing the full-text data:

COCA: Corpus of Contemporary American English
The corpus contains more than 520 million words of text (20 million words each year 1990-2015) and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.
COHA: Corpus of Historical American English
COHA contains more than 400 million words of text from the 1810s-2000s and is balanced by genre, decade by decade.
GloWbE: Global Web-based English
GloWbE contains about 1.9 billion words of text from twenty different countries.

Full-text corpus data

The full-text corpus data for COCA, COHA and GloWbE are each available.

COCA: Corpus of Contemporary American English - Apply for Access

COHA: Corpus of Historical American English - Apply for Access

GloWbE: Global Web-based English - Apply for Access

Note that each dataset is available in three different formats: Database, Word/lemma/PoS, and Linear text.
For more information about the data formats see corpus.byu.edu.

NOW: Corpus of News on the Web

NOW: Corpus of News on the Web
NOW contains 3.7 billion words of data from web-based newspapers and magazines from 2010 to the present time. The corpus grows by about 4-5 million words of data each day (from about 10,000 new articles), or about 130 million words each month. Note: full-text data for this corpus is not available.

For help

tdm-access@berkeley.edu
Send questions about text and data mining access to library resources to this shared email above, which brings together librarians and campus partners with subject, copyright, technical, and licensing expertise.

For help with text mining tools and software, check out the D-Lab.
Questions and suggestions related to this guide can go to the Library Data Services Program.

Secondary menu

BYU Corpus Data: Home

BYU Corpus Data

For help