Learning Digital Geographies through a Correlation-based Autoencoder

Authors: Pengyuan Liu*, University of Leicester, Stefano De Sabbata, University of Leicester
Topics: Social Geography, Quantitative Methods
Keywords: Autoencoder, LSTM, Resnet18, Coefficient Correlation, CCA
Session Type: Paper
Day: 4/4/2019
Start / End Time: 1:10 PM / 2:50 PM
Room: Capitol Room, Omni, East
Presentation File: No File Uploaded

Twitter provides a rich repository of human perspectives and sentiments regarding a wide spectrum of topics by millions of users. Recent years have witnessed a growing interest in the analysis of geo-located tweets within the field of Digital Geography. However, due to the vast amount of data produced daily on social media, quantitative analysis and summarization are frequently a necessary step. Twitter data clustering and visualization are useful to identify users' opinions, popular events and urban planning in both physical and online world. We introduce our unsupervised clustering approach to group tweets with similar contents based on texts and images through a stacked multimodal autoencoder-based neural network. Inspired by the idea that there exists a correlation between image and text in each tweet, we not only stack an LSTM-based autoencoder and a Resnet-18 style CNN-based autoencoder together to learn features from texts and images by reconstructing them, but also stack one more layer to calculate the coefficient correlation between them to obtain better joint representations from the original tweets. Furthermore, considering that the contents of tweets are correlated with the geolocations where tweets are posted, we implement canonical-correlation analysis (CCA) to analyse the correlations between the joint representations and geo-coordinates. Finally, a bagged clustering method is adopted to cluster tweets based on their correlations analysed by CCA to get geographically distributed Twitter topics. The proposed method shows its advantages dealing with huge volumes of multimedia data where qualitative analysis is unlikely feasible and will be tested by using crowd-sourcing platforms.

Abstract Information

This abstract is already part of a session. View the session here.

To access contact information login