Skip to Main Content

Library Licensed Datasets Catalog

-A-

Acta Sanctorum 

An electronic version of the complete printed text of Acta Sanctorum, from the edition published in sixty-eight volumes in Antwerp and Brussels. It is a collection of documents examining the lives of saints, organized according to each saint's feast day, and runs from the two January volumes published in 1643 to the Propylaeum to December published in 1940. 

- Lives of Saints - 

AggData Retail Datasets

AggData is short for "aggregate data"; the site provides usable, portable databases of information extracted from across the web. Specializing in location-based information, such as complete geocoded lists of retail chain locations, the quality of the data is assured by extracting from original, first-hand online sources, rather than re-tooling third-party databases.

-Business Information Services-

Airbnb Data

Market Summary reports of Airbnb listing and rental performance for six metropolitan areas in the United States, including revenues and property features.

  • Chicago-Naperville-Elgin, IL-IN-WI MSA: daily, August 2014-September 2016; monthly, August 2014-September 2016; annual, year ending October 2016 
  • Houston-The Woodlands-Sugar Land, TX MSA: daily, August 2014-September 2016; monthly, August 2014-September 2016; annual year ending October 2016
  • Los Angeles-Long Beach-Anaheim, CA MSA: daily. August 2014-September 2016; monthly, August 2014-September 2016; annual year ending October 2016
  • New York-Newark-Jersey City, NY-NJ-PA MSA: annual, year ending October 2016
  • Portland-Vancouver-Hillsboro, OR-WA MSA: daily, August 2014-September 2016; monthly, August 2014-September 2016; annual, year ending October 2016
  • San Francisco-Oakland-Hayward, CA MSA: daily, August 2014-September 2016; monthly, August 2014-September 2016; annual, year ending October 2016.

-Hospitality - Industry - Rental Housing.- Airbnb-

Annual Survey of Industries 

Statistical information related to the growth, composition, and structure of India's organized manufacturing sector comprising activities related to manufacturing processes, repair services, gas, and water supply, and cold storage, with supporting documentation. 

-Industrial Statistics-

-B-

BioCyc

BioCyc is a collection of 19495 Pathway/Genome Database (PGDBs) for model eukaryotes and for thousands of microbes, plus software tools for exploring them. BioCyc is an encyclopedic reference that contains curated data from 130000 publications. 

-C-

California Foreclosure Custom Report

CEIC Global 

Macroeconomic, industrial, and financial data for countries world wide. Click Guest Access for immediate access. Registration is free for UC Berkeley students, staff and faculty and offers enhanced features.

Provides detailed macroeconomics data with more than one million times series for 121 countries, including National Accounts, Balance of Payments, Production, Sales & Inventory, Banking, Construction, Interest and Foreign Exchange Rates, Government Finance, Investment, Demographics and Labor Markets, Economic & Household Surveys, Tourism, Inflation, Transport and Telecommunication, and Domestic & International Trade. Sources include International Organizations and National Statistics Offices. 

China Data Online

Economic statistics of China, arranged by regions and categories. Includes monthly and yearly reports on China's macroeconomic development, statistical databases about China's population and economy at the county and city level, and financial indicators of industrial branches. (All China Data Center)

Also includes statistical yearbooks, census data, industrial and marketing surveys, and an atlas of China. Includes link to China Data Center at the University of Michigan. 

China 2013 Economic Census Data with Province Maps

The 2013 Economic Census is the 3rd economic census conducted in China. The 2013 Economic Census data provide extensive information on various industries by provinces in China. This data product has integrated the 2013 Economic Census data and the provincial GIS maps of all of mainland China.

-Chinese Provinces - Economic Conditions - Census 2013-

China Geo-Explorer II

China Geo-Explorer II provides access to demographic and business data, maps and reports for Mainland China at various administrative levels, and access to efficient data integration for spatial and non-spatial data (custom radii, administrative units, spatial boundaries). It is a tool for quick and accurate location analysis and spatial assessment, and a tool for identifying spatial patterns and trends. The database can generate easy-to-use and customized reports, dynamic charts, tables and maps. Data can be exported to PDF, Excel, Word, or Shape files. 

-China Population-

Chinese boundary map data

Administrative boundary map data for China from 1949 to 2016. Available in shapefile format for various administrative levels.

Congressional Record Text Corpus 

The Congressional data here were derived from the following publications:

  • Annals of Congress (1789 - 1824)
  • Register of Debates (1824 - 1837)
  • Congressional Globe (1833 - 1873)
  • Congressional Record (1873 - 2005)

XML files are organized by date in the following format: YYYYMMDD.

CPS Utilities. Child Support/Alimony

CPS Utilities combines the demographic, labor force, and income items from the March supplement with the child support and alimony items from the April supplement. For this reason, only 6 of the 8 rotation groups are found in this sample (the remaining 2 groups do not have both March and April data).

-Population - Demographics Surveys - Child Support - Alimony - Demographic Aspect - Divorced Women-

CPS Utilities. Monthly Basic

CPS Utilities provide compressed data, documentation, and extraction software for the Current Population Surveys. These files are the responses to the basic set of questions that each rotation group is asked each month. It comprises basic and earner study demographic information. 

-Population - Labor Supply-

CPS Utilities. Worker Issues I 

CPS Utilities provide compressed data, documentation, and extraction software for the Current Population Surveys. This series includes the basic data as well as several employment-related supplements. The Displaced Worker Supplements provides data for all persons 20 years of age or older who lost a job involuntarily within the last five years because of operating decisions of a firm or business in which the worker was employed. Information includes reasons for job displacement, industry and occupation of the former job, group health insurance coverage, job tenure, and weekly earnings. The Occupational Mobility and Job Tenure Supplements provide data on the length of time doing the current kinds of work and the length of time working continuously for the present employer. Occupation changers are asked their reasons for changing, what educational retraining programs were undertaken, and pay comparisons between current and former jobs. An Unemployment Compensation Supplement was included in the three 1989 files and in February 2000. 

-Population - Labor Supply - Unemployment - Employees - Occupational Mobility-

CPS Utilities. Worker Issues II

CPS Utilities provide compressed data, documentation, and extraction software for the Current Population Surveys. Topics in the May/September series include Multiple job holdings, Union membership, Income and work experience, Pension, and retirement plans. Adult education, Work schedules, Work benefits, Premium pay, Unemployment compensation, Tobacco usage, and Race and ethnicity.

-United States - Labor Supply - Employees-

Cross-National Time-Series Data Archive

The Cross-National Time-Series Data Archive was launched by Arthur S. Banks in the fall of 1968 at the State University of New York at Binghamton. The archive was, in part, the outcome of an effort initiated some years earlier to assemble in machine-readable; longitudinal format, certain of the aggregate data resources of The Statesman's Yearbook, published in 1864. The archive has almost 200 variables and contains data for over 200 country units, with provision for entries from 1815  (excluding the two modern wartime periods, 1914-1918 and 1940-1945).

-Comparative Government - Economic History -Economic Indicators - Political Indicators - Political Violence - Social History -Social Indicators - World Politics-

-D-

DataPlanet

Provides easy access to a wide variety of economic, social, political, and marketing indicators. (DataPlanet - Data-Planet - Statistical Datasets)

Offers easy access to over 5,500 data sets from over 65 source providers and 16 subject categories, including banking, criminal justice, education, energy, food and agriculture, government, health, housing and construction, industry and commerce, labor and employment, natural resources and environment, income, cost of living, stocks, transportation, and more. Sources of data include federal, local, state and international governments and organizations. Allows for customization of the data by selecting subjects, and the ability to view your data in side-by-side tables, charts and maps. 

Dave Leip's Atlas of The U.S Presidential Elections. 

Provides election results for the major and minor candidates for the U.S President from 1789 through 2012. Individual year pages include candidates, parties, popular and electoral vote totals, maps, charts, and voter turnout (1932 - 2012). Includes menus and hyperlinks to individual state results pages for the Presidential Elections from 1892 through 2012. County-level maps and data are available for the elections from 1960 through 2008 (and prior to 1960 for members). The 2000 through 2008 pages also include the results maps for President by U.S. Congressional District.

-Politics and Government (20th & 21st)-

Death Master File (Full File)

Contains records of deaths from 1936 and later that have been reported to the Social Security Administration. Includes the following information on each decedent, if the data are available to the SSA: Social security number, name, date of birth, date of death, state or country of residence (Feb. 1998 and prior), ZIP code of the last residence, and ZIP code of lump-sum payment

-Registers of Birth & Social Security Records-

Digital Topographic Maps

Digital topographic map sets -- many of them georeferenced -- covering many different countries around the world.

-E-

EPFR Global Fund Flows and Allocations Data.

Flows and assets of funds aggregated up to the fund level, indexed by share class and month; and self-reported country allocations of funds, indexed by fund and month.

-Mutual Funds-

EPS China Data = EPS shu ju ping tai 

EPS China Data presents a collection of China statistical data and census data. It provides access to 41 China statistical databases (and more sub-databases) sourced from industrial, regional, and national organizations, covering a wider range of subjects and fields. EPS China Data contains over 1.2 million time-series of basic and combined statistical indicators with a yearly increment of more than 55 million numeric data.

-Economic Forecasting - Electronic Journals-

Esri data & maps

Map data at many scales of geography for the world, Mexico, Canada, and European countries, as well as general and detailed data for the United States.

Euratlas Georeferenced Historical Vector Data

Georeferenced historical vector data of Europe. Contains 21 maps, one for each century from year 1 CE to year 2000 CE. These maps depict the political makeup of Europe at the last year of each century. Each map is composed of shapefiles representing physical features layers (rivers, seas, and elevation data) and political features layers (sovereign nations, dependent territories, autonomous peoples, cities and towns, uncertain boundaries). The data can be analyzed and organized spatially with GIS software. Covers Europe as well as the Mediterranean region, including large parts of northern Africa and the Middle East.

-Spatial vector data-

-F-

FTSE Russell Indices 

FTSE Russell Indices is a dataset with closing positions for all constituents of the 21 U.S. Russell Indexes at month end back to December 1997. Username and password are required. Please email librarydataservices@berkeley.edu for more information.  

-G-

Gale Digital Scholar Lab

Gale Digital Scholar Lab is a Digital Humanities platform that gives us users the possibility to analyze and interrogate data with the text analysis and visualization tools built into the platform. This guide gives an overview of how to best use this platform. 

-Digital Humanities - Humanities and Social Sciences-

Global Financial Data

Academic Users can log in one of two ways. If you log in anonymously, you can search the database, access individual data series, and graph individual series. If you log in with your username and password, you can also access your personalized home page, which helps you to customize your access and download multiple series into a single worksheet. To gain personal access to Global Financial Data:

1.  Please create your own account. You must use your university email address in creating the account.

2. You must be logged in through your university's proxy server using your university email address both to create your account and to log into the database. Any account created outside of the university will not be activated. 

3. Logins outside of the university's server will not receive access. 

-H-

Historical Business Files

Consists of annual snapshots of historic company data from Infogroup's U.S. Business Database. Includes company name, mailing address, SIC and NAICS codes, employee size, sales volume, latitude/longitude, and many more variables about each company.

-Business Enterprises-

Historical Records

This service features access to Dun & Bradstreet's historical U.S. files. There is one file per year from 1969 through 2015 with the exception of 1981 and 1984, which are unavailable. 

-Corporations - Finance (Dun and Bradstreet, Inc)-

Atlas of the U.S. House Elections 

This dataset provides United States election data for House elections by county and congressional districts from 1992. 

-Elections - Presidents - Politics and Governments (20th & 21st century)-

Atlas of the U.S. Presidential Elections, Dave Leip

-I-

Inter-university Consortium for Political and Social Research (ICPSR)

ICPSR receives, processes, and distributes data on social phenomena in countries across the world. ICPSR maintains a data archive of on topics in the social and behavioral sciences, including specialized collections in education, aging, criminal justice, substance abuse, terrorism, and other fields. Includes survey data, census records, election returns, economic data, and legislative records.

Direct download access to data sets requires the creation of a personal account. In addition, analysis of ICPSR data sets requires the use of specialized software. For more information on this process, please consult the ICPSR Get Help page or schedule an appointment with the D-Lab.

IMD World Competitiveness Online

"IMD World Competitiveness Online is a unique and comprehensive database on the competitiveness of nations. It includes time series from the IMD World Competitiveness Yearbook, the leading annual report published by IMD since 1989, the IMD World Talent Ranking, and from the IMD World Digital Competitiveness Ranking."

-Time Series from the IMD World Competitiveness Yearbook-

Imports and Exports by Related Parties

Annual summary of import and export transactions involving trade between a USPPI and an ultimate consignee where either party owns directly or indirectly 10 percent or more of the other party. 

-International Business Enterprises - Foreign Subsidiaries - Imports - Exports-

India Stat

Socio-Economic Statistical Information About India. (Password-based access- data)

Ingenuity Pathway Analysis (IPA)

Database and gene expression pathway analysis software for data derived from 'omics experiments, such as RNA-seq, small RNA-seq, microarrays including miRNA and SNP, metabolomics, proteomics, and small-scale experiments.

Functional annotations relating to biological and chemical interactions, cellular phenotypes, and disease processes are referenced from the scientific literature.

-L-

Latinobarometro Opinion Publica Latinoamericana (Latino Data Bank)

-Public Opinion (Latin America) - Latin America Politics and Government - Economic Policy - Foreign Relations-

Linguistic Data Consortium (LDC)

The UC Berkeley Library has access to several Linguistic Data Consortium datasets. Please start by checking the LDC catalog (linked above) for dataset numbers and descriptors. Check in UC Library Search or email librarydataservices@berkeley.edu to check access.

Los Angeles Sentinel Archive

Creation Date: 2017. ProQuest Historical Newspaper data for the Los Angeles Sentinel 1934 - 2005, OCR'ed content (result from automated Optical Character Recognition - quality varies).

-Los Angeles Sentinel 1934 - 2005-

-M-

Media Intelligence Center

Provides audited circulation reports, circulation data, and publishers' statements for newspapers, consumer magazines business publications, and farm publications from the US, Canada, and other countries. 

-Periodicals - Circulation - Newspapers-

Mergent Fixed Income Securities Database

Wharton Research Data Services

Comprehensive database of publicly offered U.S. bonds with over 140,000 corporate, corporate MTN (medium-term note), supranational, U.S. Agency, and U.S. Treasury debt securities.

Mobility in Cities

Mobility in Cities Database contains the main results of a major research project of UITP on the economics of urban mobility. Data used in this report were collected for 60 metropolitan areas worldwide for the year 2012. The database covers demography, the economy, urban structure, the number and use of private vehicles (including taxis), the road network, and public transport networks (infrastructure and rolling stock, supply and demand, farebox revenue). and mobility patterns.

-Urban Transportation-

Municipal Form of Government 

This survey is conducted every 5 years and examines the structure (form) of cities/municipalities; election systems; provisions for referendum or recall; term limits; and the powers and authority of the chief official and governing board. 

-Municipal Government - Social Surveys-

-N-

National Sample Survey

Statistical data on consumer expenditures in India from the National sample survey, with supporting documentation.

-Consumption (Economics) - Consumers - Household Survey- 

Neighborhood Change Database 1970 - 2010

Contains nationwide tract-level data from the 1970, 1980, 1990, 2000, and 2010 decennial censuses. Combines U.S. Census Bureau data into one product with variables and tract boundaries that are consistently defined across census years. Contains long-form census data with details such as population, household, and housing characteristics, income, poverty status, education level, employment, housing costs, immigration, and other variables. The software creates customized, exportable data and map reports. 

-Housing - Population - Census Data (1970-2010)-

Nielsen Datasets

UC Berkeley subscribes to the Nielsen marketing data from Marketing Data Center at Chicago Booth. The James M. Kilts Center for Marketing at Chicago Booth and the Nielsen Company have partnered to make two consumer marketing datasets available to US-based academic researchers.

  1. To request access to the Nielsen data at the Kilts Center for Marketing, please complete the online code request form below.
  2. After the administrator approves your request for a code, you will receive an email with your registration code and instructions on how to complete the process. 
  3. The Kilts Center will review your project(s) and send you a final activation email along with steps for how to access the data. If you are a Ph.D. student your advisor will be required to take action before you can be activated. 

NYSE

Trading data from the New York Stock Exchange for every business day from 2007 through 2015, as follows:

  • NYSE ProTrac EOD Replay (in folder "TRADE_PROGRAM") contains a summary of all program trading activity for each stock identifying the total amount of shares that were executed for each stock through index arbitrage program trading vs. all other program trading strategies.
  • NYSE ProTrac EOD Summary (in folder "SUM_PROGRAM") contains all program trading activity for each stock identifying the total amount of shares that were executed for each stock through index arbitrage program trading vs. all other program trading strategies.
  • NYSE ReTrac EOD Replay (in folder "TRADE_RETAIL") is a record-by-record replay of the ReTrac datafeed, containing the Symbol, Volume, and Time of each retail execution.
  • NYSE Retrac EOD Summary (in folder "SUM_RETAIL") provides an end-of-day summary of all retail trading activity for each stock identifying the total amount of shares that were bought vs. sold for each stock that was traded by retail customers; shares are totaled for each symbol in separate buy and sell quantities. 

-New York Stock Exchange - Stocks-

-O-

Orbis

Orbis contains information on over 200 million companies worldwide, both private and public. Data is standardized for easy cross-border comparisons. 

Orbis includes companies' financial accounts, credit scores from a number of independent providers, directorships, ownership structures, PEPs and sanctions information, and details of mergers and acquisition activity. 

-P-

Passport

PROTEOME

Protein information for species including human, mouse, rat, yeast, worm, and plant, including links to relevant scientific literature. Human entries include associated disease information.

ProQuest Congressional Record Permanent Digital Collection

The Congressional Record is the official record of the proceedings, database, and activities of the U.S. Congress. Although the Record contains a substantially verbatim account of the proceeding and debate, it also contains extensive inserted materials, communications from the President and executive agencies, memorials, and petitions. UC Berkeley access only. 

ProQuest Historical Newspapers. San Francisco Chronicle 

Downloadable full-text corpus of the San Francisco Chronicles and its predecessor titles, the Daily dramatic chronicle, and the Daily morning chronicle, covering 1865 to 1922. Provided to facilitate text data mining. Each article is contained in a separate XML-encoded text file, the quality of OCR text varies. 

-San Francisco (California)-

PRS CountryData

Data on political risk and macroeconomics at a country level.

Includes international country risk group (ICRG) political risk data for 140 countries, monthly from 1980, as well as macroeconomic data and Political Risk Services (PRS) forecasting. The ICRG risk data include measures of bureaucracy, corruption, civil disorders, ethnic tensions, poverty, inflation, terrorism, and other metrics. The political risk forecasting section (PRS) provides country ratings on debt, investment risk, exchange controls, tariff barriers, trade restrictions, domestic economic problems and other risk factors. Information on the methodology used for calculating ratings and forecasts is available.

 

-R-

Regional Economic Data toolkit

The Regional Economic Data Toolkit brings together four searchable and downloadable U.S. datasets: Cost of living index, state incentives, state expenditures, and economic diversity index. 

-Economic Development - Cost and Standard of Living - Business Enterprises- 

ROPER

US and international polling and public opinion survey data.

Contains domestic and international survey data. The Center's Public Opinion Location Library (iPOLL) gives online access to a database including poll questions asked in US from 1936 to present.

-S-

SF City Map

This San Francisco, California map set from Sanborn includes GIS data, 3D data, and digital orthophoto imagery data from the early 2000s.
 

SimplyAnalytics

A web-based data analysis and mapping application that allows users to create custom thematic maps, tables, and reports using demographic, business, and marketing data for the United States. Note: Access limited to 10 simultaneous users. To increase the limit temporarily for use in classes or workshops, please send your request to eart@library.berkeley.edu a minimum of 5 business days in advance.

Includes over 70,000 data variables related to demographics, employment, real estate & housing, crime, businesses, consumer spending, and points of interest data from the US Census, historical US Census data (2000), SimmonsLOCAL data from Experian, and Nielsen Claritas PRIZM data.

Social Explorer 

Data and interactive thematic maps from the U.S. Census from 1790-present.


Provides access to current and historical United States census data, including all historic decennial censuses and American Community Surveys, as well as other demographic information, such as religious organizations. Census data is current to 2010 and historical back to 1790. In addition to being a data resource, the web interface lets users create maps and reports to better illustrate, analyze and understand demography and social change.

StateMap of India: Digital Map of State Boundaries of India 

State boundaries of India are linked to decennial census demographic and socio-economic data. 

-India Population - India Administrative and Political Divisions-

Statista: The Portal for Statistics

A statistics portal that integrates data from reliable sources on thousands of topics

Categorized into market sectors, Statista provides access to quantitative facts on media, business, politics, and other areas. Sources of information include market research reports, trade publications, scientific journals, and government sources. Data may be downloaded into spreadsheets and presentations. Also includes industry reports.

-T-

TDM Studio 

One of the most thorny aspects of embarking on a text data project is collecting the materials for your corpus. If you are interested in text mining newspapers, scholarly journals, dissertations, or primary sources, ProQuests's TDM Studio may help you address this challenge. 

The Arabidopsis Information Resource (TAIR)

Genetic and molecular biology data for the model higher plant Arabidopsis thaliana.

- Genome - Gene Structure - Gene Product Information - DNA - Seed Stocks - Genome Maps - Arabidopsis Research Literature-

The Qualitative Data Repository

Sharing data and its documentation for secondary analysis. Empowering qualitative and multi-method inquiry through guidance and consultation. Providing data and materials to enrich and teach. Developing innovative approaches for enriching publications with data and analysis. Users of the Qualitative Data Repository need to register for an account using their @berkeley.edu email address. 

Thomson Reuter Tick History on Boarding Guide

-U-

UN Comtrade

UC Berkeley subscribes to Comtrade Premium which provides a more powerful search interface and bulk downloads. To activate this feature, please register for an account using your Berkeley email. After account activation, login in the upper right corner using your UC Berkeley email and the password you create during registration.

For many countries, data go back to 1962. (Please note: Some users report problems using this database with Firefox.) Commodities are classified according to SITC (Rev.1 from 1962, Rev.2 from 1976 and Rev.3 from 1988) and the Harmonized System (HS) (from 1988 with revisions in 1996 and 2002). Currently most data are reported according to HS, version 1996. Data are in US dollars and quantities in metric units.

UNIDO Statistical Data 

Works best in Internet Explorer.
International statistical data on industries and mining from the United Nations Industrial Development Organization. (United Nations Industrial Development Organization International Industrial and Mining Statistics)


Includes INDSTAT 4, with data for the number of establishments, employees, wages, output, and female employees by detailed industry (4-digit level of the International Standard Industrial Classification of All Economic Activities (ISIC)) for more than 100 countries from 1990; and INDSTAT2 with the variables at the 2 digit level of ISIC for countries back to 1963. Also includes the IDSB database (with Import, Export, Output and Apparent Consumption) at the 4-digit ISIC level from 1990, and MINSTAT, a database of industrial statistics with similar variables for the mining sectors.

-V-

Vector map data for Cuba

Vector map data, including a variety of physical and political point, line, and polygon features. Full-country coverage.

Vector map data for India

Vector map data for India containing shapefiles representing railways, airports, and administrative boundaries, as well as census data from 2011.

Vector map data for Iran

Vector map data for selected areas of Iran including a variety of physical and political point, line, and polygon features.

Vector map data for Port-au-Prince, Haiti

Vector map data for the city of Port-au-Prince, Haiti from 2010 (pre-earthquake). Includes coastline, 3 classes of roads, city quarters, other built-up areas, airports, and buildings.

Vector map data for Syria

Vector map data (consisting of point, line, and polygon shapefiles) for Syria containing a variety of data, including symbols and text from the original paper maps, road structures, survey information, relief, vegetation, landcover, boundary posts, buildings, economy and social objects, pipes and powerlines, hydrology, and hydrotechnical features.