It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
UC Berkeley’s library buildings are open! Learn more.
The Chronicling America API provides access to information about historic U.S. newspapers and millions of digitized newspaper pages and their OCR data is available for bulk download. See the full list of digitized newspaper titles (1836-1922) for more information.
Build textual content sets from Gale primary source collections for data visualization and text data mining. Users will need to click Sign In and login with their @berkeley.edu account
Primary source collections include: American Fiction
17th and 18th Century Burney Collection, American Civil Liberties Union Papers, 1912-1990, American Fiction, Archives Unbound, Archives of Sexuality & Gender, British Library Newspapers, The Economist Historical Archive, Eighteenth Century Collections Online, Indigenous Peoples: North America, The Making of Modern Law, The Making of the Modern World, Nineteenth Century Collections Online, Nineteenth Century U.S. Newspapers, Sabin Americana, 1500-1926, The Times Digital Archive, The Times Literary Supplement Historical Archive, U.S. Declassified Documents Online
Multiple collections of digitized primary sources related to southern history, literature, and culture. Some collections offer plain-text downloads in their entirety: The Church in the Southern Black Community, First-Person Narratives of the American South, Library of Southern Literature, North American Slave Narratives.
19MB zip file containing an XML document for every full text article from Godey's Lady Book, Parts I-III (Accessible Archives). The magazine was intended to entertain, inform and educate the women of America and covers fashion, biographical sketches, articles about mineralogy, handcrafts, female costume, the dance, equestrienne procedures, health and hygiene, recipes, remedies, and the like.
Nearly 14 million books from the HathiTrust Library are currently available for analysis, offering various levels of immediate access. Check the HTRC tab on this guide for more information to help you get started.
The Proceedings of the Old Bailey (1674-1913) and of the Ordinary of Newgate's Accounts (1676-1772), containing records from 197,745 criminal trials held at London's central criminal court. It allows access to over 197,000 trials and biographical details of approximately 2,500 men and women executed at Tyburn. Use the site API or download XML files.
105MB zip file containing an XML document for every full text article from the Pennsylvania Gazette (Accessible Archives). This paper provides a first-hand view of colonial America, the American Revolution and the New Republic, offering important social, political and cultural perspectives of each of these periods.
Project Gutenberg hosts over 50k ebooks, most of which are older books in the public domain. If you want to download more than about 100 books/day, use one of the mirror sites listed from the link above.
Request access from the D-Lab for ProQuest Historical Newspaper data for the San Francisco Chronicle (1865-1922). Note that the quality of the OCR (results from automated Optical Character Recognition) is quite low and varies from paper to paper.