Skip to Main Content

Research Methods--Quantitative, Qualitative, and More: Text Mining and Computational Text Analysis

About Text Mining and Computational Text Analysis

Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources... The difference between regular data mining and text mining is that in text mining the patterns are extracted from natural language text rather than from structured databases of facts.

(From What is Text Mining? by Marti Hearst)

Getting Started at Berkeley

Start with the Library's Research Guide titled Text Mining & Computational Text Analysis!  Here you will find pages on each of the following topics: 

  • where to find data and texts
  • copyright, software and tools
  • TDM platforms
  • newspapers and magazines
  • linguistic corpora
  • scholarly journals
  • data repositories
  • citations and metadata
  • government documents
  • historical and archival sources
  • social media and web sources
  • general media, literature
  • HathiTrust Research Center

For more on what you can and can not do with TDM, check out the text data mining section of the Library's Office of Scholarly Communication Services Copyright Page.

You may also want to take a look at the Library's Spring 2023 Digital Publishing Series, with workshop offerings on TDM-related topics!

The site for all things DH on campus can be found at Digital Humanities at Berkeley, and includes information on the DH working group, a list-serv, and training sessions. 

What is "Digital Humanities"?

Digital humanities are an interdisciplinary set of fields that are primarily concerned with using digital technologies, sources and methods as part of research in the humanities. These fields are heavily involved with using electronic information and computational methods to investigate, analyse, synthesise and present research. Digital humanities also aim to explore how electronic media affects research in the discipline and likewise how humanities research contributes to computer studies. This is a rapidly emerging area of research, encompassing a wide range of methods and practices.

(From the University of Sydney)