New York Times Annotated Corpus (1987-2007)
Linguistic Data Consortium's NY Times Corpus contains over 1.8 million articles from the New York Times between January 1, 1987 and June 19, 2007. The corpus includes: over 1.8 million articles (excluding wire service articles); over 650,000 article summaries; human- and algorithm-assigned tags drawn from a normalized indexing vocabulary of people, organizations, locations and topic descriptors; Java tools for parsing corpus documents from .xml into a memory resident object.