Authors: Morteza Karimzadeh*, Ohio State University, Alan M. MacEachren, Pennsylvania State University, Scott Pezonowski, Pennsylvania State University
Topics: Geographic Information Science and Systems, Cyberinfrastructure
Keywords: geographic information retrieval, geoparsing, toponym resolution, place name disambiguation , named entity recognition
Session Type: Paper
Start / End Time: 1:20 PM / 3:00 PM
Room: Grand Ballroom A, Astor, 2nd Floor
Presentation File: No File Uploaded
Geoparsing is defined as the process of automatically recognizing and resolving place names in unstructured text to toponyms with a unique identifier and geographic coordinates. Performant geoparsing is key in enabling unstructured text as a reliable data source in Geographic Information Systems. Further, extracted toponyms can be used in Geographic Information Retrieval (GIR), exploratory as well as computational analysis of location information embedded in text. Geoparsing is usually performed in two consecutive stages: (1) Place name identification, also known as toponym recognition, in which Named Entity Recognition (NER) algorithms or models are used to identify places names in text; and (2) place name disambiguation, also known as toponym resolution (TR), in which each of the place names mentioned in text are resolved (mapped) to one (or more) of several candidate toponyms. In this paper, we present GeoTxt, a scalable geoparsing system, and use a corpus of manually geo-annotated tweets to evaluate its toponym recognition. We compare the performance of five different state-of-the-art rule-based and machine learning-based Named Entity Recognition algorithms and models implemented in open-source packages. We also present a scalable toponym resolution method based on an inverted index, and discuss future research in enabling performant and scalable geoparsing systems.