Authors: Avipsa Roy*, Arizona State University, Trisalyn Nelson, Arizona State University
Topics: Geographic Information Science and Systems, Spatial Analysis & Modeling, Transportation Geography
Keywords: Bias-correction, LASSO, Big Data, Bicycling, Urban Planning, Crowdsourcing, Machine Learning
Session Type: Paper
Presentation File: No File Uploaded
Cities supporting pro-bicycling policies require data on how ridership varies across cities and changes over time. Traditional approaches to data collection lack sufficient coverage and are often expensive, as a result, cities are more inclined towards using crowdsourced data platforms like Strava to understand ridership patterns. However, Strava data are biased towards bicyclists who typically use Strava, and women, children, elderly, and lower-income bicyclists are usually undersampled. Using statistical models and GIS correct sampling bias by developing adjustment factors that are sensitive to street or path characteristics. However, developing adjustment factors requires advanced data analysis skills. A generalized approach to correcting bias in crowdsourced data (eg: Strava) is required to enable cities making decisions using such data sources to have data reflective of all ages and abilities of bicyclists, help improve bicycle ridership and bridge the knowledge gap between analysts and policymakers. Our goal is to develop a method for identifying adjustment factors that are sensitive to the characteristics of street and path segments and can inform urban planners about the bicycle ridership patterns from bias-corrected crowdsourced data across multiple cities. In this study, we identify adjustment factors to correct bias in crowdsourced data in five North American cities. We select bias-adjustment factors specific to each city using a machine learning algorithm called LASSO. The selected factors are then used in a Poisson regression model to predict overall ridership across all cities. We assess prediction accuracy and comment on what are the best practices for improving data collection.