Library Guides: Research Methods--Quantitative, Qualitative, and More: Text Mining and Computational Text Analysis

About Text Mining and Computational Text Analysis

Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources... The difference between regular data mining and text mining is that in text mining the patterns are extracted from natural language text rather than from structured databases of facts.

(From What is Text Mining? by Marti Hearst)

Getting Started at Berkeley

Start with the Library's Research Guide titled Text Mining & Computational Text Analysis! Here you will find pages on each of the following topics:

where to find data and texts
copyright, software and tools
TDM platforms
newspapers and magazines
linguistic corpora
scholarly journals
data repositories
citations and metadata
government documents
historical and archival sources
social media and web sources
general media, literature
HathiTrust Research Center

For more on what you can and can not do with TDM, check out the text data mining section of the Library's Office of Scholarly Communication Services Copyright Page.

You may also want to take a look at the Library's Spring 2023 Digital Publishing Series, with workshop offerings on TDM-related topics!

The site for all things DH on campus can be found at Digital Humanities at Berkeley, and includes information on the DH working group, a list-serv, and training sessions.

What is "Digital Humanities"?

Digital humanities are an interdisciplinary set of fields that are primarily concerned with using digital technologies, sources and methods as part of research in the humanities. These fields are heavily involved with using electronic information and computational methods to investigate, analyse, synthesise and present research. Digital humanities also aim to explore how electronic media affects research in the discipline and likewise how humanities research contributes to computer studies. This is a rapidly emerging area of research, encompassing a wide range of methods and practices.

(From the University of Sydney)

Secondary menu

Research Methods--Quantitative, Qualitative, and More: Text Mining and Computational Text Analysis

About Text Mining and Computational Text Analysis

Getting Started at Berkeley

What is "Digital Humanities"?