Methods: NIH Analysis
Following $6.6B at the NIH in FY2024
Environmental Factors Breakdown
Summed dollar amounts across designated sub-fields within each category (based on grant titles).
Figure 1. NIH Grant Funding by Disease and Research Category (FY2024). Click on a disease or category to see detailed breakdowns. Environmental factors research (red) represents the smallest share across all disease areas, despite substantial evidence linking environmental exposures to disease risk.
Overview
This analysis examines how NIH research funding is distributed across research categories within specific disease areas. The key finding: environmental factors research consistently receives the smallest share of funding, even for diseases where environmental exposures are major risk factors.
Data Source
NIH grant data was obtained from NIH RePORTER for fiscal year 2024. Grants were queried using NIH's official RCDC (Research, Condition, and Disease Categorization) terms for eight disease areas.
| Disease Category | RCDC Search Term | Total Grants | Total Funding |
|---|---|---|---|
| Breast Cancer | Breast Cancer | 1,766 | $740M |
| Colorectal Cancer | Colorectal Cancer | 790 | $338M |
| Lung Cancer | Lung Cancer | 1,169 | $492M |
| ALS | Amyotrophic Lateral Sclerosis | 301 | $234M |
| Parkinson's Disease | Parkinson's Disease | 510 | $253M |
| Liver Disease | Liver Disease | 1,907 | $945M |
| Contraception/Fertility | Contraception/Reproduction | 1,596 | $739M |
| Biodefense | Biodefense | 4,138 | $2.88B |
Classification Categories
Each grant was classified into one of three research focus categories:
| Category | Description | What's Included |
|---|---|---|
| Mechanistic & Genetic |
Research on biological pathways, molecular mechanisms, protein function, signaling, and genetic factors | GWAS studies; germline mutations; cellular processes; resistance mechanisms; tumor biology |
| Clinical & Other |
Clinical trials, screening programs, behavioral interventions, health disparities, infrastructure | Phase I-III trials; screening programs; training grants; administrative supplements |
| Environmental Factors |
Research studying environmental exposures as risk factors for disease | Chemicals; smoking; diet; microbiome; radiation; alcohol; gene-environment interactions |
Classification Methodology
Step 1: Initial Candidate Selection
~325,000 NIH grants (FY2022–2025) were filtered using broad keyword patterns to identify ~29,000 candidates potentially related to environmental exposures.
| Keyword Category | Example Terms |
|---|---|
| Chemical exposures | PFAS, pesticide, heavy metal, air pollution, PM2.5, endocrine disruptor, phthalate, BPA |
| Lifestyle factors | smoking, tobacco, alcohol, diet, obesity, nutrition, sedentary |
| Biological exposures | microbiome, microbiota, gut bacteria, infection, pathogen, virus |
| Physical exposures | radiation, UV, ionizing, radon, electromagnetic |
Step 2: LLM Classification (Environmental vs. Not)
Candidate grants were classified using Claude Haiku with strict criteria:
| Classification | Criteria | Examples |
|---|---|---|
| Environmental | Studies environmental exposures as risk factors—epidemiology, exposure assessment, biomonitoring, cohort studies tracking exposure and disease outcomes | "Air pollution and children's asthma risk in urban cohorts" |
| Not Environmental | Mechanism studies (even if using exposure models), interventions/treatments (cessation programs, dietary interventions), administrative/training grants, drug development | "Inflammatory pathways in asthma"; "Smoking cessation intervention" |
Key Distinction: The question is whether the grant studies how exposures cause disease (Environmental) versus disease mechanisms independent of exposure (Not Environmental).
Step 3: Verification Pass
LLM classifications were verified using keyword-based rules to remove false positives.
| Error Type | Example | Why Misclassified |
|---|---|---|
| Mechanism studies | "Role of alcohol-adapted Kupffer cells in liver fibrosis" | Uses alcohol as a model, but studies cellular mechanisms—not exposure-disease relationship |
| Intervention studies | "Smoking cessation intervention for pregnant women" | Intervention/treatment focus, not studying exposure as risk factor |
| Training/conferences | "Environmental Health Sciences Training Grant" | Administrative/training, not research |
Result: 8,735 initial Environmental classifications reduced to 6,752 verified (23% false positive rate removed).
Step 4: Exposure Subgroup Assignment
Verified Environmental grants were assigned to exposure subgroups using strict keyword matching with priority ordering (earlier categories take precedence to avoid double-counting):
| Priority | Subgroup | Keywords |
|---|---|---|
| 1 | Smoking | tobacco, nicotine, vaping, cigarette, e-cigarette |
| 2 | Alcohol | ethanol, AUD, fetal alcohol, alcohol use disorder |
| 3 | Infection | viral, bacterial, pathogen, HIV, hepatitis |
| 4 | Diet/Obesity/Nutrition | dietary, BMI, metabolic, obesity, nutrition |
| 5 | Microbiome | microbiota, gut bacteria, dysbiosis |
| 6 | Radiation | UV, ionizing, radioactive, radon |
| 7 | Gene-Environment | GxE, gene-environment interaction |
| 8 | Chemicals | PFAS, pesticides, air pollution, heavy metals, PM2.5, endocrine disruptor |
Note: Chemicals checked last to avoid over-capturing grants that mention chemical terms in other contexts (e.g., "chemical biology" methodology).
Step 5: Mechanistic Subcategory Classification
Non-environmental grants classified as "Mechanistic" were further subcategorized into 8 research focus areas using keyword matching on grant titles:
- Microbial Pathogenesis
- Immune/Inflammatory
- Tumor Biology
- Neurodegeneration
- Metabolic
- Genetics/Genomics
- Reproductive/Developmental
- Biomarkers
Validation Results
| Classification Step | Accuracy | Notes |
|---|---|---|
| Environmental classification (initial) | 75% | 75% correct, 20% borderline, 5% incorrect |
| Environmental classification (after verification) | 77% | True positive rate after removing false positives |
| Exposure subgroup assignment | 88-89% | After priority reordering |
| Mechanistic subcategories | 82-83% | Consistent across disease areas |
Key Findings
| Disease | Environmental Funding | % of Total | Context |
|---|---|---|---|
| Parkinson's Disease | $12.5M | 5% | Twin studies suggest 60-70% of risk is non-genetic |
| Lung Cancer | $48M | 10% | Non-smoking lung cancer accounts for 10-20% of cases |
| Liver Disease | $21M | 2% | Growing evidence links PFAS and other chemicals to liver damage |
Limitations
- Title/terms classification: Classification based on titles and project terms, not full abstracts. Some grants may be misclassified due to limited information.
- Single-label assignment: Each grant receives one category; some grants span multiple areas and could reasonably be classified differently.
- Keyword sensitivity: Classification depends on specific word choices; grants using non-standard terminology or synonyms may be missed.
- Fiscal year: Disease-specific analysis uses FY2024; results may vary by year.
Data Files
Full methodology documentation is maintained in grant_categorization/nih_classification/:
environmental_classification_methodology.md— Complete Environmental vs. Not classification criteriamechanistic_subcategory_methodology.md— Mechanistic subcategorization approachclassification_guide.md— Decision tree and edge case guidance