Authors: Xiao Li*, Texas A&M University, Daniel W Goldberg, Texas A&M University
Topics: Geographic Information Science and Systems, Transportation Geography, Hazards, Risks, and Disasters
Keywords: Crash Analysis, Accident-Prone Road Segment, Data Mining, Geostatistical Methods
Session Type: Poster
Start / End Time: 1:20 PM / 3:00 PM
Room: Napoleon Foyer/Common St. Corridor, Sheraton, 3rd Floor
Presentation File: No File Uploaded
Traffic accidents has become the seventh leading cause of preventable death in the United States, and the eighth globally. Geostatistical methods work well for identifying and visualizing existing traffic “black” zones. However, these methods can’t predict the driving risk for newly-constructed roads. This study takes advantages of data mining method to investigate the relationship between driving risk and road-related features, which makes contributions for better understanding the occurrence of crashes: 1) different geostatistical methods and data mining techniques are compared for accident-prone road segment identification; 2) the relationship between driving risk and road-related features are innovatively investigated.
In this paper, 45,898 car accidents of Polk County, Iowa between 2011 and 2016 are snapped to their nearest road segments. 34 road-related features are created and then aggregated as the attributes of each road segment. Moran’s I is adapted to test the spatial autocorrelation of car accidents. Kernel Density Estimation is performed to identify the crashes-concentrated road segments. Crash rate is calculated to present the driving-risk for each road segment. In terms of crash rate, road segments are labeled as “Safe,” “Low-Risk,” and “High-Risk.” 6324 road segments with 35 attributes, including 34 road-related features and labels, are fed to a data-mining engine. Three different data mining methods are implemented for identifying accident-prone road segments.
The result demonstrates overall the data mining methods are effective for labeling the road segments as “Safe,” “Low-Risk,” and “High-Risk” based on the selected features. Of the three methods tested, Decision Tree has the highest accuracy of 85.02%.