Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists

Authors: Avipsa Roy*, , Trisalyn Nelson, Arizona State University, Stewart Fotheringham, Arizona State University, Meghan Winters, Simon Fraser University
Topics: Transportation Geography, Urban and Regional Planning, Spatial Analysis & Modeling
Keywords: Bias Correction, Bicycle, Crowdsourcing, Big Data, Active Transportation, LASSO
Session Type: Poster
Day: 4/5/2019
Start / End Time: 8:00 AM / 9:40 AM
Room: Lincoln 2, Marriott, Exhibition Level
Presentation File: No File Uploaded

Traditional methods of counting bicyclists are resource-intensive and generate data with sparse spatial and temporal detail. Previous research suggests big data from crowdsourced fitness apps offer a new source of bicycling data with high spatial and temporal resolution. However, crowdsourced bicycling data are biased as they oversample recreational riders. Our goals are to quantify geographical variables which can help in correcting bias in crowdsourced data and to develop a generalized method to correct bias in big crowdsourced data on bicycle ridership in different settings in order to generate maps representative of all bicyclists at a street-level spatial resolution. We use street-level Strava ridership data for 2016, geographical covariate data and official counts from 44 locations across Maricopa County, Arizona, USA (train data) , and 60 locations from the city of Tempe (test data), within Maricopa. First, we quantified the relationship between Strava and official ridership data volumes. Second, we used a mixed-model approach with LASSO and Poisson regression to integrate geographical covariates, Strava, and training data to correct bias. Finally, we predicted bias-corrected average annual daily bicyclist counts for Tempe using a mixed model approach with LASSO and Poisson regression and also evaluated model accuracy across Tempe. We found a correlation between annual ridership data from Strava and official counts (R2= 0.52) in Maricopa County. The results indicate that our model could correct bias in crowdsourced data from Strava in Tempe with 80.3% of road segments being predicted to ±50 average annual bicyclists.

Abstract Information

This abstract is already part of a session. View the session here.

To access contact information login