Authors: Mohammad Eshghi*, University of Oregon
Topics: Geographic Information Science and Systems, Spatial Analysis & Modeling, Quantitative Methods
Keywords: Machine learning, Predictive modeling, Semi-supervised learning, Geospatial occurrence-only data
Session Type: Paper
Start / End Time: 5:00 PM / 6:40 PM
Room: Washington 6, Marriott, Exhibition Level
Presentation File: No File Uploaded
Geospatial data that represent the occurrences and non-occurrences of an event of interest are needed in order to have good predictive modeling of that event. Regarding this, there are two matters that need to be considered at the time of model fitting: the amount of available data and the availability of both occurrence and non-occurrence classes. Often, the only available data is about the occurrences of an event – which could also be not large enough in terms of size. This tradition of representing only partial data presents a challenge in developing valid and reliable modeling. Examples of such data are public health data, transportation accidents, species distribution data, and crime data. Semi-supervised learning (SSL) and one-class classification (OCC) are the two main approaches for handling occurrence-only (i.e. one-class) data where unlabeled data are available or not, respectively. In this paper, we focus on the former since it provides the opportunity for the learner algorithm to use the potential power of a large amount of available unlabeled data. We compare and discuss the advantages and disadvantages of different categories of SSL methods for modeling geospatial occurrence-only data using a data set of species distribution.