Methods: NIH Analysis

Methodology for classifying NIH grants by disease area and research category

Following $6.6B at the NIH in FY2024

Click on a research area node on the left or a category nodes on the right to see the detailed breakdown.

Environmental Factors Breakdown

$225.9M

Summed dollar amounts across designated sub-fields within each category (based on grant titles).

Funding level:
$0
Max

Figure 1. NIH Grant Funding by Disease and Research Category (FY2024). Click on a disease or category to see detailed breakdowns. Environmental factors research (red) represents the smallest share across all disease areas, despite substantial evidence linking environmental exposures to disease risk.

Overview

This analysis examines how NIH research funding is distributed across research categories within specific disease areas. The key finding: environmental factors research consistently receives the smallest share of funding, even for diseases where environmental exposures are major risk factors.

Data Source

NIH grant data was obtained from NIH RePORTER for fiscal year 2024. Grants were queried using NIH's official RCDC (Research, Condition, and Disease Categorization) terms for eight disease areas.

Disease Category RCDC Search Term Total Grants Total Funding
Breast CancerBreast Cancer1,766$740M
Colorectal CancerColorectal Cancer790$338M
Lung CancerLung Cancer1,169$492M
ALSAmyotrophic Lateral Sclerosis301$234M
Parkinson's DiseaseParkinson's Disease510$253M
Liver DiseaseLiver Disease1,907$945M
Contraception/FertilityContraception/Reproduction1,596$739M
BiodefenseBiodefense4,138$2.88B

Classification Categories

Each grant was classified into one of three research focus categories:

Category Description What's Included
Mechanistic & Genetic
Research on biological pathways, molecular mechanisms, protein function, signaling, and genetic factors GWAS studies; germline mutations; cellular processes; resistance mechanisms; tumor biology
Clinical & Other
Clinical trials, screening programs, behavioral interventions, health disparities, infrastructure Phase I-III trials; screening programs; training grants; administrative supplements
Environmental Factors
Research studying environmental exposures as risk factors for disease Chemicals; smoking; diet; microbiome; radiation; alcohol; gene-environment interactions

Classification Methodology

Step 1: Initial Candidate Selection

~325,000 NIH grants (FY2022–2025) were filtered using broad keyword patterns to identify ~29,000 candidates potentially related to environmental exposures.

Keyword Category Example Terms
Chemical exposures PFAS, pesticide, heavy metal, air pollution, PM2.5, endocrine disruptor, phthalate, BPA
Lifestyle factors smoking, tobacco, alcohol, diet, obesity, nutrition, sedentary
Biological exposures microbiome, microbiota, gut bacteria, infection, pathogen, virus
Physical exposures radiation, UV, ionizing, radon, electromagnetic

Step 2: LLM Classification (Environmental vs. Not)

Candidate grants were classified using Claude Haiku with strict criteria:

Classification Criteria Examples
Environmental Studies environmental exposures as risk factors—epidemiology, exposure assessment, biomonitoring, cohort studies tracking exposure and disease outcomes "Air pollution and children's asthma risk in urban cohorts"
Not Environmental Mechanism studies (even if using exposure models), interventions/treatments (cessation programs, dietary interventions), administrative/training grants, drug development "Inflammatory pathways in asthma"; "Smoking cessation intervention"
Key Distinction: The question is whether the grant studies how exposures cause disease (Environmental) versus disease mechanisms independent of exposure (Not Environmental).

Step 3: Verification Pass

LLM classifications were verified using keyword-based rules to remove false positives.

Error Type Example Why Misclassified
Mechanism studies "Role of alcohol-adapted Kupffer cells in liver fibrosis" Uses alcohol as a model, but studies cellular mechanisms—not exposure-disease relationship
Intervention studies "Smoking cessation intervention for pregnant women" Intervention/treatment focus, not studying exposure as risk factor
Training/conferences "Environmental Health Sciences Training Grant" Administrative/training, not research

Result: 8,735 initial Environmental classifications reduced to 6,752 verified (23% false positive rate removed).

Step 4: Exposure Subgroup Assignment

Verified Environmental grants were assigned to exposure subgroups using strict keyword matching with priority ordering (earlier categories take precedence to avoid double-counting):

Priority Subgroup Keywords
1Smokingtobacco, nicotine, vaping, cigarette, e-cigarette
2Alcoholethanol, AUD, fetal alcohol, alcohol use disorder
3Infectionviral, bacterial, pathogen, HIV, hepatitis
4Diet/Obesity/Nutritiondietary, BMI, metabolic, obesity, nutrition
5Microbiomemicrobiota, gut bacteria, dysbiosis
6RadiationUV, ionizing, radioactive, radon
7Gene-EnvironmentGxE, gene-environment interaction
8ChemicalsPFAS, pesticides, air pollution, heavy metals, PM2.5, endocrine disruptor

Note: Chemicals checked last to avoid over-capturing grants that mention chemical terms in other contexts (e.g., "chemical biology" methodology).

Step 5: Mechanistic Subcategory Classification

Non-environmental grants classified as "Mechanistic" were further subcategorized into 8 research focus areas using keyword matching on grant titles:

  • Microbial Pathogenesis
  • Immune/Inflammatory
  • Tumor Biology
  • Neurodegeneration
  • Metabolic
  • Genetics/Genomics
  • Reproductive/Developmental
  • Biomarkers

Validation Results

Classification Step Accuracy Notes
Environmental classification (initial) 75% 75% correct, 20% borderline, 5% incorrect
Environmental classification (after verification) 77% True positive rate after removing false positives
Exposure subgroup assignment 88-89% After priority reordering
Mechanistic subcategories 82-83% Consistent across disease areas

Key Findings

Disease Environmental Funding % of Total Context
Parkinson's Disease $12.5M 5% Twin studies suggest 60-70% of risk is non-genetic
Lung Cancer $48M 10% Non-smoking lung cancer accounts for 10-20% of cases
Liver Disease $21M 2% Growing evidence links PFAS and other chemicals to liver damage

Limitations

  • Title/terms classification: Classification based on titles and project terms, not full abstracts. Some grants may be misclassified due to limited information.
  • Single-label assignment: Each grant receives one category; some grants span multiple areas and could reasonably be classified differently.
  • Keyword sensitivity: Classification depends on specific word choices; grants using non-standard terminology or synonyms may be missed.
  • Fiscal year: Disease-specific analysis uses FY2024; results may vary by year.

Data Files

Full methodology documentation is maintained in grant_categorization/nih_classification/:

  • environmental_classification_methodology.md — Complete Environmental vs. Not classification criteria
  • mechanistic_subcategory_methodology.md — Mechanistic subcategorization approach
  • classification_guide.md — Decision tree and edge case guidance