Skip to main content

Text Mining & Computational Text Analysis: Sources

Books

Newspapers & Magazines

Available by request: ProQuest Historical Newspapers

Researchers may request OCR full text from any of the following specific newspapers for a specific time period, though requests will require significant processing time:

  • Chicago Defender (1910-1975)‎
  • Chicago Tribune (1849-1930*)‎
  • Los Angeles Times (1881-1930*)‎
  • The New York Times (1851-1933*) 
  • The Wall Street Journal (1889-1932*)‎
  • The Washington Post (1877-1932*)‎
  • The Baltimore Afro-American (1893-1998)
  • The Times of India (1838-2005)
  • The Guardian (1821-1906)
  • The Observer (1791-1906) 

Scholarly journals

Citations & Metadata

Government Documents

Linguistic Corpora

Literature

Social Media

Historical & Archival Collections

Available by Request: 

Adam Matthew Digital 
Contact History Librarian Jennifer Dorner (dorner@berkeley.edu) to request access to OCR text and full metadata from any of Adam Matthew Digital's primary source databases.

Gale Digital Collections
Request access to Gale content for text analysis purposes, including access to OCR text from databases like the Eighteenth Century and Nineteenth Century Collections Online, as well as content from Gale’s newspaper archives. See Gale's FAQ (pdf) or brief description for more information. Email tdm-access@berkeley.edu.

 

Data Repositories

Please scrape responsibly

For help

tdm-access@berkeley.edu
Send questions about text and data mining access to library resources to this shared email above, which brings together librarians and campus partners with subject, copyright, technical, and licensing expertise. 

  • For help with text mining tools and software, check out the D-Lab.
  • Questions and suggestions related to this guide can go to Cody Hennesy.

What is text mining?

"Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources... The difference between regular data mining and text mining is that in text mining the patterns are extracted from natural language text rather than from structured databases of facts."

- from What is Text Mining? by Marti Hearst (2003)

Copyright © 2014-2016 The Regents of the University of California. All rights reserved. Except where otherwise noted, this work is subject to a Creative Commons Attribution-Noncommercial 4.0 License.