It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
The Chronicling America API provides access to information about historic U.S. newspapers and millions of digitized newspaper pages and their OCR data is available for bulk download. See the full list of digitized newspaper titles (1836-1922) for more information.
Full-text, structured XML files of OCRed text from the Guardian and the Observer newspapers during the years 1791-1909. 205,357 pages. From ProQuest Historical Newspapers. To access, fill out the form linked from the catalog record.
Request access from the D-Lab for ProQuest Historical Newspaper data for the San Francisco Chronicle (1865-1922). Note that the quality of the OCR (results from automated Optical Character Recognition) is quite low and varies from paper to paper.
The Vogue Archive contains the entire run of Vogue magazine (US edition), from the first issue in 1892 to the current month, reproduced in high-resolution color page images. Every page, advertisement, cover and fold-out has been included, with rich indexing enabling you to find images by garment type, designer and brand names. XML and JPEG files.
LDC's TIPSTER corpus was compiled to advance the state of the art in effective document detection (information retrieval) and data extraction from large, real-world data collections. Among other sources it includes portions of the Wall Street Journal, San Jose Mercury News, and the AP Newswire from the late 80s and early 90s. (Read more about TIPSTER)
Available by request: ProQuest Historical Newspapers
Researchers may request OCR full text from any of the following specific newspapers for a specific time period, though requests will require significant processing time. The following sets are already available for TDM use: