Skip to Main Content

Information Studies: APIs for scholarly resources

A guide to information studies at UC Berkeley.

APIs for scholarly resources

Below is a list of scholarly databases and collections that offer some form of API access, online tools to access data, or raw data downloads.

arXiv API
API access to Cornell's open-access arXiv e-print repository, used primarily by physics, mathematics and computer science communities to share cutting-edge research.

BioMed Central API
A variety of access points to BioMed Central's corpus of 150,000 peer-reviewed articles, including a guide to text-mining the collection.

Caselaw Access Project
Includes all book-published US court decisions. 40,000 volumes with cases from 1658-2018 were scanned from Harvard Law School Library.

Chronicling America API
API access to information about American english language historic (pre-1923) newspapers and select digitized newspaper pages from the Library of Congress.

Elsevier Scopus APIs
"Scopus APIs expose curated abstracts and citation data from all scholarly journals indexed by Scopus, Elsevier's citation database."

Europe PubMed Central
A RESTful Web Service giving you access to all of the publications and related information in the Europe PubMed Central database.

HathiTrust Extracted Features Dataset
Page-level features from 4.8 million public domain volumes. The dataset includes over 734 billion words, dozens of languages, and spans multiple centuries. Features include the token (unigram) count and header and footer identification, on a per-page basis, as well as volume-level metadata and much more.

IEEE Xplore Search Gateway
Query the Institute of Electrical and Electronics Engineers content repository and retrieve results for manipulation and presentation on local web interfaces. Contact to receive API user guide. (UCB access)

JSTOR Data for Research
Not an API per se, but you can use DFR to select and interact with data and metadata from JSTOR's archive of scholarly journal literature (more than 7 million journal articles) and primary resources (26,000 19th Century British Pamphlets).

Microsoft Academic Search API
Microsoft Academic Search indexes millions of academic publications, and displays relationships between and among subjects, content, and authors, highlighting the critical links that help define scientific research. API access by request.

National Library of Medicine (NLM) APIs
A directory of medical resource APIs including PubChem, TOXNET and AIDSinfo.

Nature OpenSearch API 
Open, bibliographic search service for content hosted on, comprising around half a million news and research articles and citations (see also Blogs API).

NCBI E-utilities API
Set of 8 server-side programs for the Entrez query and database system at the National Center for Biotechnology Information (NCBI).

Query the ORCID researcher identifier system (including individual researchers, universities, national laboratories, commercial research organizations, research funders, publishers, national science agencies, data repositories, and international professional societies) to obtain researcher profile data.

Open Academic Graph
Downloadable datasets for citations drawn from two large academic graphs: Microsoft Academic Graph (MAG) and AMiner

PLOS Article-Level Metrics API
Comprehensive information about the usage and reach of articles published by the Public Library of Science (including usage statistics, citation counts, and social networking activity).

Query content from the seven open-access peer-reviewed journals from the Public Library of Science using any of the twenty-three terms in the PLOS Search.

PubMed Central OAI-PMH service
Provides access to metadata of all items in the PubMed Central (PMC) archive, as well as to the full text of a subset of these items.

PubMed Citation Files (XML)
Open access to the full set of PubMed citation records in XML format from NCBI, including incremental update files. See the README.txt file for more information.

"Open source R packages that provide programmatic access to a variety of scientific data, full-text of journal articles, and repositories that provide real-time metrics of scholarly impact." 

Springer API Portal
Robust set of APIs for metadata, images and articles from this scientific publisher of books and journals, including close to 500 academic and professional society journals.

UN Comtrade Web Services
Access data from the United Nations Commodity Trade Statistics database, including International Merchandise Trade Statistics (IMTS) and the work of the International Merchandise Trade Statistics Section (IMTSS) of the United Nations Statistics Division.

Web of Sciences Web Services
Query over 8,000 of the leading journals in the arts, humanities, sciences and social sciences, indexed by Web of Science to return limited article information including article title, authors, source data, and author supplied keywords. UCB access from on-campus computers: see documentation for Web Services Expanded or documentation for Web Services Lite.

Worldbank APIs
Three APIs to provide access to different datasets: one for Indicators (or time series data), one for Projects (or data on the World Bank’s operations), and one for the World Bank financial data (World Bank Finances API).

Worldcat Identities
Worldcat is a combined library catalog for participating libraries around the world. The Identities API provides "personal, corporate and subject-based identities (writers, authors, characters, corporations, horses, ships, etc.) based on information in WorldCat."

  • Worldcat xISBN, xISSN and xOCLCnum
    Submit book and periodical identifiers and return related identifiers and metadata.

[Much of the above was adapted from Mark Clemente's excellent guide to APIs for Scholarly Resources at MIT.]


Digital Public Library of America (DPLA) API Codex
Access data from the DPLA repostiory of cultural and scientific knowledge, including partner data from Harvard, NY Public Library, ARTstor, the David Rumsey Historical Map collection and more. Zipped json files of partner data and the entire repository are available for bulk download

Google Books Ngram Viewer datasets
The Google Books Ngram Viewer provides a frontend to explore word counts from the entire corpus of digitized Google books. Google also provides access to the thousands of raw datasets on which the Viewer operates. More information: TED Talk on Google Ngrams.

Tools for the digital humanities