Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Update: Moffitt Library is closed for seismic work, but most other libraries are open. Learn more.

Finding Health Statistics & Data: A D-Lab Training: Welcome!

Today's agenda:

  1. Welcome;
  2. Health statistics: what and why;
  3. Exploring some sources together;
  4. Further reading and concluding remarks.

Link to this guide:

Finding Health Statistics & Data.

Presented by Michael Sholinbeck, (
UC Berkeley D-Lab (via Zoom), March 17, 2022.

Want More ..?

This guide has only a small number of sources for health statistics and data. Many more, as well as tips for using data, may be found on the Bioscience, Natural Resources & Public Health Library's Health Statistics & Data guide.

Here is a handy table (pdf) of where to find what, from the US National Network of Libraries of Medicine. (The rows list types of statistics, eg binge drinking, disability status, seat belt use, STD prevalence, etc., and the columns list places to find this information, eg BRFSS, US Census, etc.)

Still can't find what you need? Ask!

Money spent on beef in 2020, households

Source: SimplyAnalytics

Caution: Survey Ahead!

Lots of health data comes from surveys. Here are some issues to consider when looking at survey or estimated data:

  • Look at sample sizes and survey response rates - representative of your population? Enough responses to be valid?
  • Who was surveyed? - representative of population being compared to? Include group you are interested in?
  • Were the survey respondents from heterogeneous groups? Do the survey questions have a similar meaning to members of different groups?
  • How was survey conducted? Via telephone? - Many people only have cell phones. Random selection or targeted group?
  • What assumptions and methods were used for extrapolating the data?
  • Look at definitions of characteristics - Does this match your own definitions?
  • When was the data collected?

(Adopted from information formerly on the UCSF Family Health Outcomes Project website)

Reliability and Validity

Reliable data collection: relatively free from "measurement error."

  • Is the survey written at a reading level too high for the people completing it?
  • Is the device used to measure elapsed time in an experiment accurate?

Validity refers to how well a measure assesses what it claims to measure

  • If the survey is supposed to measure quality of life, how is that concept defined?
  • How accurately can this animal study of drug metabolism be extrapolated to humans?

(Adopted from Chapter 3, Conducting research literature reviews: from the Internet to paper, by Arlene Fink; Sage, 2010.)

A Data Biography

The idea of a data biography comes from the We All Count Project for Equity in Data Science. For any datasets you use, ask these questions:

  • Who:
    • Who collected the data?
    • Who owns the data?
  • How:
    • The methods behind the data collection design and process?
  • Where:
    • In what locations was the data collected?
    • Where is the data stored?
  • Why:
    • For what purpose was the data collected?
  • When:
    • When was the data collected?

Is "Cause of Death" a Count or an Estimate?

"Before COVID-19, many people seemed to have believed that every death in the United States - indeed in the world - was accurately registered in some universally accessible system that would serve as an eternal record of who died from what and when. Perhaps one of the silver linings of the pandemic has been that it has exposed that notion as fantasy."

Towards a “post p < 0.05 era”

Here's a post from one of my favorite blogs, AEA365: A Tip-a-Day by and for Evaluators:

Towards a “post p < 0.05 era” by Tamara Young which addresses the decades-old and highly contentious debate about null hypothesis statistical significance testing. The post includes some “Rad Resources“ as well as some tips for evaluators.

Two is always two. Except when it’s not.

So you think math is an objective science? Think again.

This blog post explains, in the most elementary language possible, how even simple statistics vary depending on who you ask, ie, where you put the locus of power in your analysis.

accompanying drawing to blog post

Context is Key

Q: What is "Health"?

A: Everything!

Statistics and data are available for a lot of things that maybe aren't directly "health" but are very much relevant to public health. Here's a few to pique your interest

Data and Statistics, California Department of Education
Data on school enrollment, non-English language learners, free lunch numbers, teacher data, class size, and much more.

Calif. Dept of Alcoholic Beverage Control: License Lookup
Find liquor stores, bars, etc. by address, census tract, city, etc. Can also search by business name, licensee name, license number.

Traffic Operations (CalTrans)
Traffic volumes, truck traffic, and ramp volume for California state highways. View tables, or download data as Excel files.

Asthma Diagnosis in Bay Area Kids, 2015-16

Major Depression in US Youth

Had at Least One Major Depressive Episode in the Past Year among Youths Aged 12 to 17, by State: 2012-2013. from NSDUH

But what if we changed the legend..?

Find Health Statistics and Data Here

American Community Survey Data Profiles (US Census).
(2010 to present, with links to older).
Quickly get recent state, county, and city demographics, including age, race/ethnicity, income/employment, residence, languages spoken, education, and more. And here is a tool to get information on Tribal Areas.
» Get data: Census data is available from several sources.

AtlasPlus (National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, CDC).
(2000 to present).
Create customized tables, maps, and charts using more than 15 years of CDC’s surveillance data on HIV, viral hepatitis, STD, and TB. AtlasPlus also provides access to indicators on social determinants of health. Updated annually.
» Export data to CSV.

National Health and Nutrition Examination Survey (NHANES).
(1959 to present).
NHANES is a survey that collects information about the health and diet of people in the United States. It is unique in that it combines a home interview with health tests that are done in a mobile examination center.
» Data is available under each survey year (eg, here is the link to 2015-2016 data).

County Health Rankings & Roadmaps.
(Some data go back to 1990s).
The CHRs measure health factors, including high school graduation rates, obesity, smoking, unemployment, access to healthy foods, the quality of air and water, income, and teen births in nearly every county in America. Roadmaps provide guidance and tools to understand the data, and strategies that communities can use to move from education to action.
» Get Data!

California Health Interview Survey: AskCHIS.
(2001 to present).
Quick access to state and local data on hundreds of health topics. Run your own customized search using AskCHIS (requires one-time, free registration), review publications and data summaries, and more. CHIS is the largest state health survey in the USA.
Use AskCHIS NE to get estimates on top health indicators by zip code, city, legislative district.
» Get Data!

CARES Engagement Network Maps.
(Dates vary, some back to early 1900s; mostly 1980s to present).
The Map Room provides > 15,000 data layers down to census tract level for demographics, social and economic factors, physical environment, health behaviors, clinical care, and health outcomes, as well as a Community Health Needs Assessment reporting tool with 80+ health-related indicators. From the Center for Applied Research and Engagement Systems (CARES) at the University of Missouri.
» Information data sources.

Data Resource Center for Child and Adolescent Health (HRSA).
(2016 to present, with links to older).
Includes national and state-level data on hundreds of child health indicators from the National Survey of Children’s Health, National Health Interview Survey Child Component, Survey of Pathways to Diagnosis and Services, and National Survey of Children with Special Health Care Needs.
» Get Data!
(Dates vary; mostly previous few years).
Find, customize, and share data on more than 400 measures of child health and well being. Data are available for every legislative district, city, county, and school district in California. Easily incorporate data into reports, presentations, grant proposals, policy decisions, media stories, and advocacy work. Search by region, by demographic group, or by topic.
» Download data to Excel.

OECD Health Statistics.
(Dates vary, but some available 1960 to present).
Tables on health and health systems across OECD countries. OECD Health is an interactive database comprising data on a range of topics on the health care systems in the OECD Member countries and Accession countries, which are presented in a demographic, economic and social context. Some time series go back as far as 1960.
» Download data to Excel, CSV, PC-axis, XML.

Demographic and Health Surveys
(Dates vary, mostly 1990s to recent)
Collects, analyzes, and disseminates data on population, health, HIV, and nutrition in over 90 countries. Data is available using the STATcompiler that allows users to select numerous countries and hundreds of indicators to create customized tables.
» Data access requires one-time, free registration.

(2000 to present).
A web-based data analysis and mapping application to create custom thematic maps, tables and reports using demographic, business, and marketing data for the United States. Includes over 70,000 data variables related to demographics, employment, real estate & housing, crime, businesses, consumer spending, and points of interest data from the US Census, historical US Census data (2000), SimmonsLOCAL data from Experian, and Nielsen Claritas PRIZM data.
NOTE: UCB access is normally limited to 10 simultaneous users, but this has been increased to 25 for the week following this training.
» Export data as a Excel, CSV, or DBF file, or shapefiles.
A note about quantiles.

(Late 1990s to present).
Online query system based on data from the Healthcare Cost and Utilization Project. It provides access to health statistics and information on hospital inpatient and emergency department utilization. From the Agency for Healthcare Research and Quality, US DHHS.
» Get Data!

Substance Abuse and Mental Health Data Archive (US DHHS).
(Dates vary; back to 1990s).
SAMHDA provides public-use data files, file documentation, and access to restricted-use data files from numerous series on substance abuse and mental health in the US.
» You can get data from links provided in each series/survey.

Gallup Analytics.
(Dates vary, mostly previous 10 years).
Gallup Analytics allows users to access and use the wealth of Gallup polling data. Polling topics include views on the government, education, well-being, economics, politics. View data by demographic categories, compare results across geographies to develop and report findings.
NOTE: UCB access is limited to 1 simultaneous user.
» Export data to Excel.

(Dates vary, mostly 2000s to present).
PeriStats is developed by the March of Dimes Perinatal Data Center and provides access to maternal and infant health data for the United States and by state or region, including more than 60,000 graphs, maps, and tables.
» Export data as a CSV file.

(Dates vary).
Statista provides access to quantitative facts on media, business, politics, and other areas. Sources of information include market research reports, trade publications, scientific journals, and government sources. Health-related topics include health systems, care and support, the state of health, medical technology, pharmaceutical products and their market, and physicians, hospitals and pharmacies.
» Download data into spreadsheets and presentations.

California Community Burden of Disease and Cost Engine (Calif. Dept. of Public Health).
(2005 to present).
The California Community Burden of Disease Engine is a tool to explore data on burden of disease in multiple levels of geographic granularity in order to answer and generate questions about the intersection between health disparities and place. The CCB currently displays condition-specific mortality burden data at the statewide, county, community, and census tract levels, with interactive rankings, charts, maps and trend visualizations. The CCB also includes a limited set of social determinants data and describes their correlations with death outcomes.
» Click Links to other data to access the original data sources.

COVID-19 Data and Dashboards (UCSF).
Links to numerous sources of data, including clinical data, epidemiological data (local, national, global), genomic data, and research data. Many are freely available for download.

Further Reading

Thank you!

Percentage of US people polled who experienced enjoyment yesterday (from Gallup Analytics):

Percentage of US people polled who experienced enjoyment yesterday