Reporting on Missing Data
morals
“The things we consider important are often undervalued by other disciplines … One of the most important concepts in Statistics is that of missing data. For most people it’s easy to ignore because much of it is not very visible.”
— Cyntha Struthers and Don McLeish
“We can’t just learn what we want to know, but what we should know.”
— Biden
“There are no routine statistical questions; only questionable statistical routines.”
— D.R. Cox
reasons you might have missing data
- the machine was broken
- the participant didn’t want to give an answer
- the participant was late to their interview
- lost/corrupted data files
- outlier data coded to missing
- suppressed/confidential/censored data
* not meant to be exhaustive
how to think about missing data
- is the missingness random, not conditional on anything? like a coin-toss? (MCAR: missing completely at random)
- is the missingness random, conditional on variables you’ve observed? (MAR: missing at random)
- is the missingness determined or conditional on variables you haven’t observed? (MNAR: missing not at random)
best practices
- report on the missingness!
- look for trends in the missingness, and report on those trends.
- (if you perform imputation) make sure your imputation model matches your regression model.