Send questions about text and data mining access to library resources to this shared email above, which brings together librarians and campus partners with subject, copyright, technical, and licensing expertise.
Right here! The Library offers a wealth of texts and data for your TDM research. We have also included datasets available freely on the web. Use the navigation menu to browse by type of data. Data include:
To suggest or request data not listed on this guide, please email tdm-access at berkeley.edu.
Can I Web Scrape?
Have more questions? Contact tdm-access at berkeley.edu for help.
Can I Do TDM on Material under Copyright?
Our Copyright and Text Mining Guide explains everything you need to know about doing TDM on material under copyright or with licensing restrictions, both when you are running your analyses and when you are publishing your results.
Can I Do TDM on eBooks & DVDs Protected by DRM?
Some materials may have an added technological protection layer of "digital rights management" (DRM). There are some situations in which it’s permitted to “break” eBook and DVD DRM to conduct TDM, but there are very specific rules you must follow. Check out the DRM parameters in our TDM law and policy guide. And if you have any questions, please get in touch at tdm-access at berkeley.edu.
OCR: Tools for Making PDFs and Images of Text Usable
What Is Text Data Mining (TDM)?
"Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources... The difference between regular data mining and text mining is that in text mining the patterns are extracted from natural language text rather than from structured databases of facts."
- from What is Text Mining? by Marti Hearst
Guide Books and Online Tutorials
Workshops and Training On Campus
DH on Campus