Skip to content

How Privacy Suppression Distorts Health Disparity Analysis

Published: at 10:00 AM

A 12-minute read on why the data we can’t see may matter most for the populations we most want to help.

You want to study racial disparities in heart disease mortality. The research question is clear, the public health importance is undeniable, and CDC WONDER—the Centers for Disease Control and Prevention’s public mortality database—promises exactly the county-level, demographically-stratified data you need. You carefully design your query: mortality rates by county, year, race, and sex. You download the data.

Then you open the file and discover that 84.7% of your cells say “Suppressed” or are statistically unreliable.

This isn’t a bug. It’s a feature—one that creates a systematic bias affecting nearly every researcher who uses CDC WONDER to study health disparities. The same demographic granularity needed to study disparities triggers the suppression rules that hide those disparities.

The Mechanics of Suppression

CDC WONDER implements privacy protections that prevent identification of individuals in small populations. The core rule is simple: when a death count falls between 1 and 9, the cell is suppressed. Zero counts are shown (since they reveal no individual information), and counts of 10 or higher are reported normally. But that middle range—where deaths occurred, but not many—becomes invisible.

This isn’t truly missing data in the statistical sense. It’s left-censored data: we know the count is somewhere between 1 and 9, but we can’t see the exact value. The suppression cascades to derived measures—mortality rates, confidence intervals, and any quantity calculated from the hidden counts become unavailable.

Beyond privacy suppression, CDC WONDER also flags rates based on fewer than 20 deaths as “unreliable.” These aren’t hidden, but they carry a warning that the statistical precision is poor. When you combine suppressed cells (privacy) with unreliable cells (statistics), the fraction of data that’s genuinely analyzable shrinks dramatically.

CDC WONDER Suppression Rules (Technical Details)

The official suppression criteria for CDC WONDER mortality data:

  1. Death counts of 1-9 are suppressed to protect privacy (the “Rule of 10”)
  2. Rates based on fewer than 20 deaths are flagged as “Unreliable” due to high relative standard error
  3. Suppression applies to the count and all derived quantities (crude rate, age-adjusted rate, confidence intervals)
  4. Zero counts (0 deaths) are displayed because they reveal no individual information
  5. Suppression is applied independently to each cell in a cross-tabulation

These rules apply uniformly regardless of the underlying population size, which creates differential impacts across demographic groups.

The Stratification-Suppression Trade-off

Here’s the core problem: the more you stratify your data to study subgroup differences, the more data you lose.

Consider heart disease mortality data for 10 U.S. states from 1999 to 2020. At the state level, almost everything is analyzable—states have large enough populations that death counts rarely fall below 10. But researchers studying disparities need finer resolution:

In our analysis of CDC WONDER data at the County × Year × Race × Sex level, we found:

CategoryTotal Data LossAnalyzable
Overall84.7%14.7%
Privacy-suppressed16.4%
Statistically unreliable68.3%

This means that for every 100 potential data points in a fully-stratified mortality analysis, only about 15 are usable for statistical inference. The rest are either hidden by privacy rules or too imprecise to interpret.

State comparison showing data availability across 10 states. New York has the highest analyzable percentage at 28.7%, while Montana has the lowest at 4.4%.

The figure above shows how this varies by state. New York, with its large urban population, retains 28.7% analyzable data. Montana, predominantly rural with smaller counties, drops to just 4.4%. Wyoming fares only slightly better at 7.8%. The pattern is clear: population density determines data visibility.

Who Gets Suppressed? The Equity Problem

If suppression were random—if it removed data uniformly across all populations—it would be a nuisance but not a bias. Researchers could note the reduced sample size and proceed with appropriate caution.

But suppression is not random. It is systematically related to the characteristics researchers most want to study.

Race comparison showing dramatic differences in data availability. White population: 54.6% analyzable. Black: 9.9%. Asian/Pacific Islander: 2.3%. American Indian/Alaska Native: 0.1%.

The racial disparity in data availability is stark:

Race/EthnicityAnalyzable Data
White54.6%
Black or African American9.9%
Asian or Pacific Islander2.3%
American Indian or Alaska Native0.1%

Think about what this means. If you’re studying racial disparities in heart disease mortality at the county level, you have access to more than half the data for White populations but less than one-tenth for Black populations. For American Indian and Alaska Native populations, you have essentially nothing—99.9% of the data is unusable.

This isn’t a failure of CDC WONDER’s design. It’s a mathematical inevitability: minority populations in most U.S. counties are small enough that their death counts frequently fall into the 1-9 range. The privacy protection works as intended. But the consequence is that the populations most subject to health disparities become invisible in the data.

The Rural-Urban Divide

The same pattern appears geographically. Rural counties have smaller populations, which means smaller death counts, which means more suppression. In our analysis:

Rural health disparities—already understudied—become even harder to quantify when the data systematically disappears.

Disease Rarity Compounds the Problem

Disease comparison showing common vs. rare conditions. Heart Disease total: 20.5% analyzable. Rheumatic Heart Disease (rare): 0.1% analyzable.

The effect is even more dramatic for rare diseases. Compare:

Disease CategoryAnalyzable Data
Heart Disease (total)20.5%
Rheumatic Heart Disease (rare)0.1%

Rheumatic heart disease, once common but now rare in the U.S., essentially vanishes from county-level analysis. Only 0.1% of cells are analyzable—practically zero. Any researcher trying to study geographic patterns of rheumatic heart disease mortality at the county level will find almost nothing to work with.

Try It Yourself: The Suppression Calculator

The interactive tool below lets you explore how different data selections affect analyzability. Select a state, race/ethnicity, and disease type to see how the fraction of usable data changes.

CDC WONDER Suppression Calculator

See how different data selections affect the amount of analyzable data. Data shown is for County x Year x Race x Sex stratification.

Data Availability20.5% Analyzable
16%
68%
21%
Suppressed (Privacy)
Unreliable (Statistics)
Analyzable

Insight: Significant data loss occurs. Your analysis will be based on a minority of potential observations.

Total Data Loss
79.1%
Usable for Analysis
20.5%

Based on CDC WONDER mortality data for 10 states (1999-2020). Suppression occurs when death counts are 1-9 (privacy rule). Unreliable rates occur when counts are too small for stable estimates.

Notice how the bar shifts as you change selections. Choose “American Indian or Alaska Native” and watch the green “Analyzable” section nearly disappear. Select “Rheumatic Heart Disease” and observe the same effect. These aren’t edge cases—they represent the daily reality of health disparity research.

The Bias Direction

When data is systematically missing, the remaining data tells a biased story. What direction does the bias run?

Consider a hypothetical: You calculate county-level heart disease mortality rates for Black populations, excluding all suppressed cells. Your analysis is based only on counties where the Black population was large enough to generate 10+ deaths per year. What kind of counties are these? Large ones. Urban ones. Counties with substantial Black populations.

You’ve just excluded:

If mortality patterns differ between included and excluded counties—and they almost certainly do—your results are biased. The direction depends on whether excluded areas have higher or lower mortality than included areas, but the bias exists regardless.

Some studies have found that rural mortality rates exceed urban rates for many conditions. If so, excluding rural areas (where suppression is highest) would underestimate overall mortality. But this varies by condition and population, making it difficult to correct without additional information.

Practical Strategies for Researchers

Suppression bias isn’t solvable, but it’s manageable. Here are strategies for navigating the trade-off:

1. Strategic Aggregation

The most direct approach: reduce stratification to reduce suppression.

Each aggregation reduces granularity but increases the analyzable fraction. The right choice depends on your research question. If you need temporal trends, don’t aggregate years. If you need age-specific rates, preserve age detail. But recognize you can’t have everything.

2. Acknowledge the Bias Explicitly

Many published studies using CDC WONDER data never mention suppression. This is a methodological oversight. At minimum, your methods section should report:

Transparency about limitations is a hallmark of rigorous research.

3. Sensitivity Analysis

Compare your results at different aggregation levels:

This won’t eliminate bias, but it reveals its potential magnitude.

4. Strategic Query Design

Sometimes you need both geographic precision and demographic detail, but not simultaneously. Consider running separate queries:

You can’t directly combine these, but each answers part of your question with less suppression than a single fully-stratified query.

5. Advanced Methods (Brief Mention)

For researchers with statistical expertise, several methods exist for handling left-censored count data:

These approaches are beyond the scope of this article, but they exist for researchers willing to invest the methodological effort.

Before You Download: Decide whether you need county-level geographic precision or demographic granularity. In CDC WONDER, you likely can’t have both. Design your query accordingly.

The Structural Tension

Privacy protection is important. The suppression rules exist because identifying individuals from mortality data is a real risk, particularly in small populations. A county with 3 deaths from a specific cause in a specific demographic group in a specific year—that’s potentially identifiable information. The rules protect real people.

But privacy protection has consequences. The consequence here is that health disparity research becomes harder precisely where disparities may be worst: in small populations, minority communities, and rural areas.

This isn’t a flaw in CDC WONDER. It’s a structural tension between two legitimate goals—privacy and precision—that current data systems haven’t resolved. Some alternatives exist: synthetic data, differential privacy, secure computing environments. But for now, researchers using public mortality data must navigate this trade-off.

Conclusion

The next time you download data from CDC WONDER, check the suppression rate before you run your analysis. If you’re studying health disparities at a granular level, prepare for most of your data to be missing—and not randomly missing, but systematically missing in precisely the populations you most want to understand.

Report your suppression rates in your methods section. Acknowledge the bias direction when you can determine it. Consider whether your conclusions would change if the hidden data were visible.

The data we can’t see may matter most for the populations we most want to help.


This analysis used CDC WONDER mortality data for heart disease (ICD-10 codes I00-I09, I11, I13, I20-I51) across 10 states from 1999-2020, stratified by county, year, race, and sex. Data and code are available upon request.


Previous Post
Extracting Wave Parameters from Noisy Data: A Fourier Projection Approach