Authors: Christoph Stich, University of Birmingham, Emmanouil Tranos*, University of Bristol
Topics: Economic Geography
Keywords: clusters, data science, text, web, archives
Session Type: Paper
Start / End Time: 9:35 AM / 10:50 AM
Room: Century, Sheraton, IM Pei Tower, Majestic Level
Presentation File: No File Uploaded
This paper proposes a new methodological approach to identify economic clusters. Instead of focusing on top-down hierarchical industrial classifications and business registration addresses, we propose a bottom-up approach, which utilises the wealth of textual data available on archived commercial websites. Our approach enables us to detect economic clusters based on two dimensions: (i) physical proximity of the location of these economic activities and (ii) semantic similarity of economic activities based on the textual content of the commercial websites. After we calculate these distances, we perform unsupervised clustering, which reveals the colocation of similar types of economic activities. Our analysis focuses on London and is based on the JISC UK Web Domain Dataset. Our approach enables us to move beyond the rigid and non-flexible top-down industrial classification, which cannot capture the clustering of evolving economic activities.