Learning from Past Classification Errors: Exploring Methods for Improving the Performance of a Deep Learning-based Building Extraction Model through Quantitative Analysis of Commission Errors for Optimal Sample Selection

Authors: Benjamin Swan*, Oak Ridge National Laboratory, Lexie Yang, Oak Ridge National Laboratory, Melanie Laverdiere, Oak Ridge National Laboratory
Topics: Remote Sensing
Keywords: Machine learning, deep learning, neural networks, remote sensing
Session Type: Paper
Day: 4/11/2018
Start / End Time: 10:00 AM / 11:40 AM
Room: Lafayette, Marriott, River Tower Elevators, 41st Floor
Presentation File: No File Uploaded


Deep Convolutional Neural Networks (CNN) have been increasingly favored for computer vision applications due to their accuracy and ability to generalize well in complex problems; however, details of how they function and in turn how they may be optimized are still imperfectly understood. In particular, their complex and highly nonlinear network architecture, including many hidden layers and self-learned parameters, as well as their mathematical implications, present questions about effective training sample selection. Without knowledge of the exact ways the model processes and transforms its inputs, intuition alone may fail as a guide to selecting informative samples. Working to improve a CNN-based building extraction model we have developed a semi-supervised, highly-scalable approach to select samples from a dataset of identified commission errors. Due to the large scope this project, tens of thousands of potential samples could be derived from identified commission errors. To efficiently trim those samples down to a manageable and effective set, we statistically summarized the spectral characteristics of areas with high rates of commission errors and grouped them using affinity propagation. The resulting exemplars were then used to select sites for sample creation. The model will be incrementally re-trained with the new data to allow for an assessment of how the addition of different types of samples affects model performance, such as precision and recall rates. By using quantitative analysis and data clustering to select highly relevant training samples, we hope to improve model performance in a manner that is resource efficient, in both model and sample creation.

Abstract Information

This abstract is already part of a session. View the session here.

To access contact information login