Authors: Mohammad Eshghi*, University of Oregon, Amy Lobben, Department of Geography, University of Oregon
Topics: Geographic Information Science and Systems
Keywords: social media, Twitter, representativeness
Session Type: Poster
Start / End Time: 1:20 PM / 3:00 PM
Room: Napoleon Foyer/Common St. Corridor, Sheraton, 3rd Floor
Presentation File: No File Uploaded
Data Representativeness, or the extent to which a sample population represents the general population, is a growing research topic within social media research. Twitter in particular is a popular social media platform and tweets are a common source of data. Twitter representativeness is broken-down into Twitter users' representativeness of the overall population and data representativeness of the Twitter database. In this paper, we perform a Tweets Reliability Analysis by focusing on the latter. Since the majority of researchers use keywords for collecting data for their event of interest, we investigate whether this method results in missing data, which results in systematic bias in data collection. Using keywords and location constraint we collect four data sets of tweets in English at both the global- and US scale. For this project, we chose as our case study, the U.S. presidential job approval. Each of the data sets are collected using one of four methods including: 1. keywords and location constraint, 2. only keywords, 3. only location constraint, and 4. no keywords and no location constraint. Using the number of tweets that are related to our topic of interest whether explicitly or implicitly (i.e. with or without mentioning keywords) in each of the four datasets our analysis focuses on investigating the usefulness of datasets collected using methods 3 and 4 as well the overlap among the four methods. Our results provide new insights regarding the data representativeness of the Twitter database.