Comparing Geographic Categories in Different Languages Using Large Volunteered Geodata

Authors: Alexandre Sorokine*, Oak Ridge National Laboratory, Gautam Thakur, Oak Ridge National Laboratory
Topics: Geographic Information Science and Systems, Cyberinfrastructure, Cultural Geography
Keywords: geographic categories, volunteered geographic information, natural language processing
Session Type: Paper
Day: 4/4/2019
Start / End Time: 8:00 AM / 9:40 AM
Room: Capitol Room, Omni, East
Presentation File: No File Uploaded

Understanding the meaning of geographic categories in different languages and cultures is a hard problem. Solving this problem will not only increase our knowledge about how geographic categories are conceptualized in human mind but also it will help to address practical challenges of semantic integration of geographic data created in different countries, languages, communities, and cultures. Earlier studies on this were based on the authoritative data sources (e.g., feature definitions in the national geographic standards) and Wikipedia. Here we explore the potential for deducing the meaning of the geographic categories from large volunteered geodatasets like OpenStreetMap and Wikimapia. First, we analyze the category hierarchies adopted by these projects to understand how they are structured and which relations they represent. Then we compare how objects are labeled with these categories across countries, regions, and languages, and if they match geographic categorizations found in the geodata standards and other studies. We employ natural language processing and machine learning techniques to analyze other contributed textual information contained in the datasets (object names and descriptions, addresses, category definitions) to assess the veracity of the object labeling and of the relations among the categories. The magnitude of these datasets (dozens or hundreds millions of entities), their long history of development and participation, the large size of the communities of contributors and maintainers will help us to achieve more reliable and quantifiable results than were achieved in the previous projects.

Abstract Information

This abstract is already part of a session. View the session here.

To access contact information login