Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Finding Health Statistics & Data: A D-Lab Training: Welcome!

Today's agenda:

  1. Welcome;
  2. Health statistics: what and why;
  3. Exploring some sources together;
  4. Further reading and concluding remarks.

"Research users are not passive recipients of distilled wisdom, they are active agents of critique and creative analysis."
- from "How to 'QuantCrit:' Practices and Questions for Education Data Researchers and Users," W. Castillo and D. Gillborn, 2018.

Link to this guide: guides.lib.berkeley.edu/publichealth/stats4dlab.

Finding Health Statistics & Data.

Presented by Michael Sholinbeck, (msholinb@library.berkeley.edu).
UC Berkeley D-Lab (via Zoom), November 2, 2022.

Want More ..?

This guide has only a small number of sources for health statistics and data. Many more, as well as tips for using data, may be found on the Bioscience, Natural Resources & Public Health Library's Health Statistics & Data guide.

Please also see the Licensed Data Sources guide for information on how to access these data.

Here is a handy table (pdf) of where to find what, from the US National Network of Libraries of Medicine. (The rows list types of statistics, eg binge drinking, disability status, seat belt use, STD prevalence, etc., and the columns list places to find this information, eg BRFSS, US Census, etc.)

Still can't find what you need? Ask!

Money spent on beef in 2020, households

Source: SimplyAnalytics

Caution: Survey Ahead!

Lots of health data comes from surveys. Here are some issues to consider when looking at survey or estimated data:

  • Look at sample sizes and survey response rates - representative of your population? Enough responses to be valid?
  • Who was surveyed? - representative of population being compared to? Include group you are interested in?
  • Were the survey respondents from heterogeneous groups? Do the survey questions have a similar meaning to members of different groups?
  • How was survey conducted? Via telephone? - Many people only have cell phones. Random selection or targeted group?
  • What assumptions and methods were used for extrapolating the data?
  • Look at definitions of characteristics - Does this match your own definitions?
  • When was the data collected?

(Adopted from information formerly on the UCSF Family Health Outcomes Project website)

Reliability and Validity

Reliable data collection: relatively free from "measurement error."

  • Is the survey written at a reading level too high for the people completing it?
  • Is the device used to measure elapsed time in an experiment accurate?

Validity refers to how well a measure assesses what it claims to measure

  • If the survey is supposed to measure quality of life, how is that concept defined?
  • How accurately can this animal study of drug metabolism be extrapolated to humans?

(Adopted from Chapter 3, Conducting research literature reviews: from the Internet to paper, by Arlene Fink; Sage, 2010.)

A Data Biography

The idea of a data biography comes from the We All Count Project for Equity in Data Science. For any datasets you use, ask these questions:

  • Who:
    • Who collected the data?
    • Who owns the data?
  • How:
    • The methods behind the data collection design and process?
  • Where:
    • In what locations was the data collected?
    • Where is the data stored?
  • Why:
    • For what purpose was the data collected?
  • When:
    • When was the data collected?

Is "Cause of Death" a Count or an Estimate?

"Before COVID-19, many people seemed to have believed that every death in the United States - indeed in the world - was accurately registered in some universally accessible system that would serve as an eternal record of who died from what and when. Perhaps one of the silver linings of the pandemic has been that it has exposed that notion as fantasy."

Towards a “post p < 0.05 era”

Here's a post from one of my favorite blogs, AEA365: A Tip-a-Day by and for Evaluators:

Towards a “post p < 0.05 era” by Tamara Young, which addresses the decades-old and highly contentious debate about null hypothesis statistical significance testing. The post includes some “Rad Resources“ as well as some tips for evaluators.

Also of note: Don’t Let the P in P Value Stand for Privilege, by Heather Krause. It offers an easy to understand message: a problem experienced by a large group is considered "real" while the very same problem experienced by smaller groups is dismissed as "chance."

Two is always two. Except when it’s not.

So you think math is an objective science? Think again.

This blog post explains, in the most elementary language possible, how even simple statistics vary depending on who you ask, ie, where you put the locus of power in your analysis.

accompanying drawing to blog post

Context is Key

Q: What is "Health"?

A: Everything!

Statistics and data are available for a lot of things that maybe aren't directly "health" but are very much relevant to public health. Here's a few to pique your interest

Data and Statistics, California Department of Education
Data on school enrollment, non-English language learners, free lunch numbers, teacher data, class size, and much more.

Calif. Dept of Alcoholic Beverage Control: License Lookup
Find liquor stores, bars, etc. by address, census tract, city, etc. Can also search by business name, licensee name, license number.

Traffic Operations (CalTrans)
Traffic volumes, truck traffic, and ramp volume for California state highways. View tables, or download data as Excel files.

Asthma Diagnosis in Bay Area Kids, 2015-16

Major Depression in US Youth

Had at Least One Major Depressive Episode in the Past Year among Youths Aged 12 to 17, by State: 2012-2013. from NSDUH

But what if we changed the legend..?