Authors: Arif Masrur*, Pennsylvania State University, Manzhu Yu, Pennsylvania State University
Topics: Spatial Analysis & Modeling, Geographic Information Science and Systems
Keywords: Interpretable machine learning, random forest, spatial modeling, wildfire ecology
Session Type: Virtual Paper
Start / End Time: 8:00 AM / 9:15 AM
Room: Virtual 49
Presentation File: No File Uploaded
Machine learning (ML) interpretability has become increasingly important for identifying accurate and relevant structural relationships between spatial events and their influencing factors. Methodologically aspatial ML algorithms with an apparent high predictive power ignore characteristic non-stationarity domain relationships contained in spatio-temporal data (e.g. dependence, heterogeneity), leading to erroneous interpretations and inappropriate management decisions. In this research, we addressed the prediction and interpretability issues of Random Forest (RF) – as it is one of the most widely used ML models in geospatial domain due to its robustness and generalizability. Although RF is intrinsically interpretable compared to many deep learning algorithms, its inner mechanism is not decomposed to the spatial and temporal dimensions. Hence, we extended the generic RF algorithm for spatio-temporally explicit model training and predictions based on spatial and temporal calibrations in the model building and prediction process. We evaluated the proposed approach using a large-scale wildfire case study in the United States. Experiments show that, the spatio-temporal sampling approach can yield good predictive accuracy compared to the aspatial RF approach (i.e. 0.81 vs 0.84), while also enabling interpretations of the ML model’s prediction decisions with accuracy and relevance. To the best of our knowledge, iST-RF is a novel approach that can support the goal of acquiring a balance between prediction and interpretation. While we demonstrated the proposed approach in the wildfire ecology context, it can be applied to any domains utilizing large geospatial datasets, including geosciences, spatial business intelligence, and spatial epidemiology.