Authors: Grant McKenzie, Department of Geography, McGill University, Zheng Liu*, Department of Geographical Sciences, University of Maryland, College Park, Yingjie Hu, Department of Geography, University at Buffalo, Myeong Lee, College of Information Studies, University of Maryland, College Park
Topics: Spatial Analysis & Modeling
Keywords: neighborhood, neighborhood name, random forest, spatial statistics, housing, craigslist
Session Type: Paper
Start / End Time: 12:40 PM / 2:20 PM
Room: Harding, Marriott, Mezzanine Level
Presentation File: No File Uploaded
Neighborhoods are vaguely defined as localized regions that share similar characteristics. They are most often defined, delineated and named by the citizens that inhabit them rather than municipal government or commercial agencies. The names of these neighborhoods play an important role as a basis for community and sociodemographic identity, geographic communication and historical context. In this work, we take a data-driven approach to identifying neighborhood names based on the geospatial properties of user-contributed rental listings. Through a random forest ensemble learning model applied to a set of spatial statistics for all n-grams in listing descriptions, we show that neighborhood names can be uniquely identified within urban settings. We train a model based on data from Washington, DC, and test it on listings in Seattle, WA, and Montréal, QC. The results indicate that a model trained on housing data from one city can successfully identify neighborhood names in another. In addition, our approach identifies less common neighborhood names and suggestions of alternative or potentially new names in each city. These findings represent a first step in the process of urban neighborhood identification and delineation. For future work, we are proposing to include multiple data sources and toning parameters for spatial statistics finer for the model.