Authors: Ruowei Liu*, University of Georgia, Xiaobai Yao, University of Georgia
Topics: Cyberinfrastructure, Geographic Information Science and Systems, Spatial Analysis & Modeling
Keywords: Selection Bias, Location-based Social Media, Twitter, Machine Learning
Session Type: Virtual Paper
Start / End Time: 8:00 AM / 9:15 AM
Room: Virtual 49
Presentation File: No File Uploaded
There have been growing interests in location-based social media (LBSM) in recent decades. Twitter is one of the most popular LBSM platforms that has been applied in all kinds of cross-disciplinary research. Despite the success of such research, the selection bias of such data is still under-investigation and a good understanding of the bias remains to be a knowledge gap. Selection bias is the bias introduced by the selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby the sample obtained is not representative of the population intended to be analyzed(Selection Bias, n.d.). In this case, the selection bias refers to the structural or compositional difference between Twitter population of a study area and the corresponding population the data represent. Past studies have shown that Twitter population is a highly non-uniform sample of the local population among the axes of gender, race and age (Mislove et al., 2011; Barberá & Rivero, 2015; Yuan et al., 2018; Jiang et al., 2018). The objectives of this research are: 1) evaluating the selection bias of LBSM data 2) explaining the selection bias of LBSM data. The research will go through three phases of Twitter-level, user-level, and county level. First, at Twitter-level, a large volume of location-based big data, in this case, geotagged Twitter data, is collected. Second, active Twitter users are extracted from the dataset which is at user-level. Third, selection-bias of LBSM data is evaluated and explained at county-level.