Do missing mechanisms matter? - an evaluation of imputation algorithms for missing mechanisms in simulated spatiotemporal datasets

Authors: Insang Song*, University of Oregon
Topics: Spatial Analysis & Modeling, Quantitative Methods
Keywords: Missing data, Imputation, Spatiotemporal dependency, Spatiotemporal Kriging, MissForest, k-nearest neighbor, Principal Component Analysis
Session Type: Virtual Paper
Presentation File: No File Uploaded

Imputing missing values and assessing the imputation accuracy is critical in spatiotemporal data analysis; thus, it has drawn attention from researchers from past decades. We compare multiple missing patterns and imputation algorithms in synthetic spatiotemporal data for a short period to explore the effect of missing data mechanism and its subsequent patterns toward the imputation. We generate missing values in simulated datasets following (1) Missing Not At Random (“MNAR”), (2) Missing At Random (“MAR”), (3) spatially or temporally blocked patterns (“block”) representing realistic missing data situation. The three experiment conditions—missing data mechanism, in four different missing ratios and four (k-Nearest Neighbour [kNN], Principal Component Analysis, missForest, spatiotemporal Kriging) algorithms—were combined. The result shows that the mechanisms MAR and MNAR merely affect the imputation accuracy overall, even though the kNN and missForest in the block pattern have lower accuracy under the higher missing ratios. The generally fair accuracy of missForest algorithm is observed, whereas the higher ratios and spatially or temporally dependent missing values are imputed better and robust with spatiotemporal Kriging. This reiterates the importance of the effect of spatiotemporal dependency even on the imputation of spatiotemporal data.

To access contact information login