Reporting on Missing Data

morals

“The things we consider important are often undervalued by other disciplines … One of the most important concepts in Statistics is that of missing data. For most people it’s easy to ignore because much of it is not very visible.”
— Cyntha Struthers and Don McLeish

“We can’t just learn what we want to know, but what we should know.”
— Biden

“There are no routine statistical questions; only questionable statistical routines.”
— D.R. Cox

reasons you might have missing data

  • the machine was broken
  • the participant didn’t want to give an answer
  • the participant was late to their interview
  • lost/corrupted data files
  • outlier data coded to missing
  • suppressed/confidential/censored data

* not meant to be exhaustive

how to think about missing data

  • is the missingness random, not conditional on anything? like a coin-toss? (MCAR: missing completely at random)
  • is the missingness random, conditional on variables you’ve observed? (MAR: missing at random)
  • is the missingness determined or conditional on variables you haven’t observed? (MNAR: missing not at random)

best practices

Hodu in red light

  1. report on the missingness!
  2. look for trends in the missingness, and report on those trends.
  3. (if you perform imputation) make sure your imputation model matches your regression model.