Authors: Jian Chen*, University of North Alabama, Satya Katragadda, Informatics Research Institute, University of Louisiana at Lafayette, Shaaban Abbady, Center for Advanced Computer Studies, University of Louisiana at Lafayette
Topics: Geographic Information Science and Systems, Cyberinfrastructure, Spatial Analysis & Modeling
Keywords: spatial clustering, MapReduce, hotspot detection, polygon propagation, big data
Session Type: Paper
Start / End Time: 5:00 PM / 6:40 PM
Room: Roosevelt 5, Marriott, Exhibition Level
Presentation File: No File Uploaded
Analysis of hotspots, referred to as spatial concentrations of abnormal activity, has broad applications in many areas important to daily living. These include epidemiology, disease surveillance, crime prevention, and environmental monitoring, to name a few. Understanding such critically important abnormalities helps identify the underlying causes of and appropriate steps for necessary action and possible remediation. Spatial scan statistics is one of the most important models in order to detect high activity or hotspots in real world applications such as epidemiology, public health, astronomy and criminology applications on geographic data. Traditional scan statistic uses regular shapes like circles to detect areas of high activity; the same model was extended to eclipses to improve the model. More recent works identify irregular shaped hotspots for data with geographical boundaries, where information about population within the geographical boundaries is available. With the introduction of better mapping technology, mapping individual cases to latitude and longitude became easier compared to aggregated data for which the previous models were developed. We propose an approach of spatial hotspot detection for point data set with no geographical boundary information. Our algorithm detects hotspots as a polygon made up of a set of triangles that are computed by a Polygon Propagation algorithm. The time complexity of the algorithm is non-linear to the number of observations, which does not scale well for larger datasets. To improve the model, we also introduce a MapReduce version of our algorithm to identify hotspots for larger datasets.