Remotesensing 13 02187 v2
Remotesensing 13 02187 v2
Article
Building Extraction from Very-High-Resolution Remote
Sensing Images Using Semi-Supervised Semantic
Edge Detection
Liegang Xia *, Xiongbo Zhang, Junxia Zhang, Haiping Yang and Tingting Chen
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China;
[email protected] (X.Z.); [email protected] (J.Z.); [email protected] (H.Y.);
[email protected] (T.C.)
* Correspondence: [email protected]; Tel.: +86-571-8529-0027
Abstract: The automated detection of buildings in remote sensing images enables understanding the
distribution information of buildings, which is indispensable for many geographic and social appli-
cations, such as urban planning, change monitoring and population estimation. The performance of
deep learning in images often depends on a large number of manually labeled samples, the produc-
tion of which is time-consuming and expensive. Thus, this study focuses on reducing the number of
labeled samples used and proposing a semi-supervised deep learning approach based on an edge
detection network (SDLED), which is the first to introduce semi-supervised learning to the edge
detection neural network for extracting building roof boundaries from high-resolution remote sensing
images. This approach uses a small number of labeled samples and abundant unlabeled images for
joint training. An expert-level semantic edge segmentation model is trained based on labeled samples,
which guides unlabeled images to generate pseudo-labels automatically. The inaccurate label sets
Citation: Xia, L.; Zhang, X.; Zhang, J.;
and manually labeled samples are used to update the semantic edge model together. Particularly,
Yang, H.; Chen, T. Building Extraction
we modified the semantic segmentation network D-LinkNet to obtain high-quality pseudo-labels.
from Very-High-Resolution Remote
Specifically, the main network architecture of D-LinkNet is retained while the multi-scale fusion
Sensing Images Using
Semi-Supervised Semantic Edge
is added in its second half to improve its performance on edge detection. The SDLED was tested
Detection. Remote Sens. 2021, 13, 2187. on high-spatial-resolution remote sensing images taken from Google Earth. Results show that the
https://doi.org/10.3390/rs13112187 SDLED performs better than the fully supervised method. Moreover, when the trained models were
used to predict buildings in the neighboring counties, our approach was superior to the supervised
Academic Editors: Saeid Homayouni way, with line IoU improvement of at least 6.47% and F1 score improvement of at least 7.49%.
and Ying Zhang
Keywords: semi-supervised; semantic edge detection; building extraction; deep learning; very-high-
Received: 25 April 2021 resolution image
Accepted: 29 May 2021
Published: 3 June 2021
2. Related Work
2.1. Building Extraction from VHR Images
There have been various traditional algorithms and theories of building detection,
and their extraction effects have many problems in practical applications [19]. The good
performance of CNNs in object classification, object extraction, semantic segmentation
and edge detection has attracted an increasing number of researchers to try building
extraction methods based on deep learning. Ji et al. [29] developed a scale robust CNN
structure to extract buildings from high-resolution aerial and satellite images. Two dilated
convolutions are used on the first two lowest-scale layers for enlarging the sight-of-view
and integrating the semantic information of large buildings, and a multi-scale aggregation
strategy is applied for improving segmentation accuracy. Hu et al. [15] took every building
as an independent individual recognition using mask scoring R-CNN and added a mask
intersection-over-union (IoU) head to score the mask, achieving better building extraction
effects than mask R-CNN. Lu et al. [19] used RCF to obtain the edge strength map of a
building and combined the geomorphological concept to optimize the edge of the building.
Reda et al. [7] proposed the FER-CNN algorithm to improve the accuracy of building
detection and classification and used the Ramer-Douglas-Peucker (RDP) algorithm to
correct the shape of the detected buildings. The purpose of segmentation is not only
satisfied with pixel-level assistance but also lies in object identification with accurate
boundaries [14].
scale features, enriching the multi-scale representation of shallow network learning and
improving the detection effect of target edges at different scales [33]. However, traditional
deep learning methods often adopted a large number of manually labeled samples to
ensure the good performance of deep learning, which is time-consuming and expensive. It
is importance to reduce the dependence of networks on labeled samples.
3. Methodology
In this section, we will describe the proposed approach for semantic edge extraction
of buildings based on semi-supervised learning. We assume that limited labeled dataset
L and a large-scale unlabeled high-resolution remote sensing dataset U are given. The
designed framework is shown in Figure 1, in which the edge extraction algorithm based
on deep learning is introduced to automatically mark unlabeled data, and the semantic
segmentation network D-LinkNet is modified to accomplish the task of edge extraction.
Now, we will introduce the two important components of this approach.
Remote Sens. 2021, 13, 2187 5 of 20
Figure 1. Edge-detection-based semi-supervised framework. A labeled dataset is used to train a pre-trained model, which
predicts an unlabeled dataset to get pseudo-labels. The labeled dataset and pseudo-labels are combined to form an extended
dataset, which has abundant labeled images. A fine-tuning model is trained with the extended dataset based on pre-trained
model.
a residual block with shortcut connections to make the network easier to optimize, which
solves the degradation problem of deep neural networks.
As shown in Figure 2, our network is divided into two parts, A and B. Part A keeps
the encoder part and center part of D-LinkNet. At each stage of part A, we use transposed
convolutional layers to perform upsampling, which restores the resolution of the feature
map to 448 × 448. Then, the network calculates the loss in each stage for more accurate
edge location information. The deep convolutional layer has richer semantic information,
while the shallow convolutional layer has richer edge details. To better realize semantic
edge detection, the network adopts multi-scale fusion to identify the target building and
locate the edge information more accurately.
Our loss function based on [16] can be formulated as
where β = |Y− |/|Y| and 1 − β = |Y+ |/|Y|. |Y| denotes the edge map. |Y− | and |Y+ |
denote the positive sample set and the negative sample set, respectively. Pr yj = 1 is the
standard sigmoid function, which computes the activation value at pixel y.
Figure 2. Our network for detecting the building edge, which is modified from D-LinkNet. D-LinkNet is composed of an
encoder, center part and decoder. Part A in our network retains the original encoder and center part from D-LinkNet for
effectively extracting the multi-scale features in the image. Part B uses multi-scale fusion to replace the decoder, has a wide
receptive field and keeps the ability of spatial detail representation. For each stage of the encoder, we use the transposed
convolution layer for upsampling and restoring the resolution of the feature image to 448 * 448. Then, the network calculates
the loss function of each stage to obtain more accurate edge location information.
Remote Sens. 2021, 13, 2187 7 of 20
Figure 3. The first and second parts of the sample are both taken from the yellow area. The first part
is the training and testing set. The second part is the unlabeled dataset. The third part is taken from
the blue area for the generalization ability test.
Figure 4. Example images from the building dataset taken in Beijing from Google Earth. (a) Original image for training and
testing, (b) edge labels of (a), (c) original image for generalization ability testing and (d) edge labels of (c).
Remote Sens. 2021, 13, 2187 9 of 20
4.3. Evaluation
We use the IoU and F1 score [36] to evaluate the performance of the proposed method
in this paper. The IoU is used to measure the degree of correlation between the real
background and the predicted results; the higher the value of the IoU is, the stronger the
correlation is. The calculation formula is as follows,
A∩B
IoU = (2)
A∪B
where A represents the predicted area, and B represents the real label area.
There are three different ways to calculate the IoU in this paper. 1. The whole IoU
calculates the overlap between the predicted value and the real label on the whole image. 2.
The single IoU is the average IoU of each building and focuses on the number of buildings.
3. We adopt the line IoU to describe the correlation between the real and predicted building
boundaries. The real and predicted building boundaries are expanded with the same
expansion kernel, and then the line IoU of the building boundary is calculated.
The F1 score is an index used to measure the accuracy of the binary classification
model. To calculate the F1 score, it is necessary to calculate the precision and recall of
each prediction. In the following formula, TP (true positives) is the number of positive
pixels that are correctly recognized as positive pixels. TN (true negatives) is the number
of negative pixels that are correctly recognized as negative pixels. FP (false positives) is
the number of negative pixels that are wrongly recognized as positive pixels. FN (false
negatives) is the number of positive pixels that are incorrectly recognized as negative pixels.
The precision is the percentage of true positives out of the positive pixels identified. The
recall is the proportion of all positive pixels in the test set that are correctly identified as
positive pixels. The F1 score is the harmonic mean of the precision and recall, relying on
the idea that the accuracy and recall are equally important.
TP
Precision = (3)
TP + FP
TP
Recall = (4)
TP + FN
Precision ∗ Recall
F1 = 2 ∗ (5)
Precision + Recall
4.4. Results
4.4.1. Effect of the Sample Scale in the Supervised Method
For verifying the influence of the sample scales on our proposed network, we trained
the network with the fully supervised method on an increased number of training images,
respectively, in the proportions of 25%, 50%, 75% and 100%. The results are shown in
Table 1 and Figure 5; as the high-quality training samples increase, CNNs learn more types
of buildings and more comprehensive image features, and all the measures of precision
estimation are in an upward trend until it becomes stable. The model trained by 100%
Remote Sens. 2021, 13, 2187 10 of 20
training samples shows competitive performance with line IoU (kernel = 3) improvement
by 10.51% and F1 score improvement by 12.68% with the results from the model trained
by 25% samples. Under this premise, we will explore the performance of increasing the
number of unlabeled samples in the next section.
Table 1. Precision evaluation results for the effect of the sample scale in the supervised method.
Figure 5. Precision evaluation results for the effect of the sample scale in the supervised method. This figure contains all the
results obtained by our proposed network with the fully supervised method.
Sample Scale
Network
100% (Fully) 50% (Semi) 75% (Semi) 100% (Semi)
F1 score 0.5362 0.6267 0.6613 0.7014
Single IoU 0.4043 0.4950 0.5254 0.5509
Whole IoU 0.3941 0.4866 0.5281 0.5724
BDCN
Line IoU
0.3233 0.3337 0.3503 0.3676
(kernel = 3)
Line IoU
0.3849 0.4281 0.4487 0.4706
(kernel = 5)
F1 score 0.8419 0.8490 0.8632 0.8650
Single IoU 0.7065 0.7204 0.7346 0.7495
Whole IoU 0.7111 0.7175 0.7391 0.7436
Ours
Line IoU
0.5089 0.5030 0.5033 0.5037
(kernel = 3)
Line IoU
0.6152 0.6240 0.6243 0.6255
(kernel = 5)
Remote Sens. 2021, 13, 2187 12 of 20
Figure 7. (a–d) are the results for fully supervised and semi-supervised building extraction in training
distinctions. There are two edge detection models, BDCN and our model. The semi-supervised
method uses the 50% labeled dataset and 100% unlabeled dataset, while the fully supervised method
uses the 100% labeled dataset and 0% unlabeled dataset. The image is the original image taken
in Beijing from Google Earth. The label is the ground truth of the image. Prediction is the edge
strength map, which is the result of our network. The fully supervised and semi-supervised results
are the complete edges, which, after post-processing, include binarization and thin, incomplete
boundary elimination.
Remote Sens. 2021, 13, 2187 13 of 20
Figure 9. The results of building extraction for fully supervised and semi-supervised methods in neighboring distinctions.
The semi-supervised method uses the 50% labeled dataset and the 100% unlabeled dataset, while the fully supervised
method uses the 100% labeled dataset and the 0% unlabeled dataset. The image is the original image taken in Beijing from
Google Earth. The label is the ground truth of the image. Prediction is the edge strength map, which is the result of our
network. The fully supervised and semi-supervised results are the complete edges, which, after post-processing, include
binarization and thin, incomplete boundary elimination.
Table 4. Precision evaluation results for sample selection mechanism analysis in the semi-supervised
method. 50%, 75% and 100% are the scale of manually labeled samples. The sample selection method
calculates the ratio of the complete boundary in prediction results to the real boundary to select
higher-quality pseudo-labels. Unselected means all the pseudo-labels participate in training without
selection. Threshold 0.7 means the method sets a threshold of 0.7 to remove incomplete samples. Top
80% means the top 80% of pseudo-labels with the highest ratio will be used for training.
Figure 10. Precision evaluation results for sample selection mechanism analysis in SDLED. 50%, 75% and 100% are the scale
of manually labeled samples. The sample selection method calculates the ratio of the complete boundary in prediction
results to the real boundary to select higher-quality pseudo-labels. Unselected means all the pseudo-labels participate in
training without selection. Threshold 0.7 means the method sets a threshold of 0.7 to remove incomplete samples. Top 80%
means the top 80% pseudo-labels with the highest ratio will be used for training.
Remote Sens. 2021, 13, 2187 16 of 20
Figure 11. (a–d) are the results of building extraction for the fully supervised method with 50%
samples in training distinctions. The image is the original image taken in Beijing from Google Earth.
The prediction is the edge strength map, which is the result of the fully supervised method. The thin
is the result of the prediction after binarization and thin. The complete result is the result of the thin
after incomplete boundary elimination.
5. Discussion
This study aimed to research the applicability of the semi-supervised method in the
edge detection network by comparing the performance of the semi-supervised and fully
supervised methods on the Beijing dataset of building roofs. The results showed that
increasing the number of samples could improve the effect of the edge detection network,
whether the label is accurate enough or not. According to the results in Tables 2 and 3,
with the information from the unlabeled samples, our semi-supervised method needed
only half of the manually labeled samples to achieve better performance than the fully
supervised method. The deep learning method based on neural networks has a serious
dependence on the sample scale. Expanding the scale of datasets will enrich the feature
information and context information of buildings. As shown in Figure 7, more complete
buildings are obtained by our method than the fully supervised method as more context
information is added. The quality of labeled samples will vary according to the level,
status and understanding of different experts, which will provide inaccurate information
to the neural network. However, the neural network can tolerate some error information
in the acceptable range and still show a good detection effect. The inaccurate pseudo-
labels generated by the semi-supervised method are also within the acceptable range of
the network. Due to the expansion of the training images, which bring more abundant
learning information, the ability of the model has been improved. From the perspective
of line IoU (kernel = 3), which represents the fitting degree between the predicted edge
and the real building boundary, with the addition of abundant pseudo-labels, the ratio of
false information to correct information increased greatly, which led to the predicted edge
deviating from the real boundary. Because the semi-supervised method of the extended
dataset had more target information, more edges are found in the prediction results. After
Remote Sens. 2021, 13, 2187 17 of 20
the expansion of the 5-pixel range, the overlap with the real boundary increased, so the
value of the line IoU (kernel = 5) was improved. By comprehensively evaluating a variety of
indicators, we believe that inaccurate pseudo-labels in building edge detection are effective,
which brings about a better detection effect. In addition, we compare our network with
another edge detection network, BDCN. Although the performance of BDCN on boundary
continuity is not satisfactory in our dataset, in comparing the semi-supervised method
with the fully supervised method, the former showed the line IoU improvement of at least
1.04% and the F1 score improvement of 9.05%. The semi-supervised method is effective
for the edge detection model to use in improving accuracy when only a small number of
manually labeled samples are accessible.
To be of value in geographic and social applications, it is also important that the
model performs well on a district that it has not been trained on. The results of the
model acting on the neighboring untrained districts show that the semi-supervised method
has stronger generalization ability than the fully supervised method. Different types of
buildings emerge in the neighboring area, wherein the context information contained is
not well grasped by the fully supervised method, so the performance of the prediction
is unsatisfactory. However, with the addition of pseudo-labels and the expansion of the
training set, the improvement in accuracy with the semi-supervised method is particularly
obvious. Although there is a great deal of completely wrong edge information in the
pseudo-labels that will affect the judgment of the model, the positive effect brought by the
supplementary type and feature information is dominant, as shown in Figure 9; the result
provides more accurate guidance for the identification of unknown targets.
In the study in Section 4.4.4, there was no significant improvement in the results by the
pseudo-label selection method. This method only considers the integrity of the boundary
and ignores the untrained building types, which is not comprehensive enough. Samples
with unknown building features are more likely to have incomplete edge extraction and
are easily excluded by the selection method. Moreover, removing the bad labels reduces
the cognition of the model. Therefore, the method of optimizing the pseudo-label should
be based on avoiding discarding as much sample information as possible. In the next step,
we will consider using a postprocessing method to improve the quality of the prediction
results to achieve the optimization.
6. Conclusions
To reduce the impact of training dataset shortages on deep learning training, in this
paper, we propose a semantic edge detection approach for very-high-resolution remote sens-
ing images based on semi-supervised learning. This is the first time that semi-supervised
learning and edge detection neural networks have been combined for extracting building
roof boundaries from VHR remote sensing images. This method uses a small number of
labeled images to train the pre-model, which is used to guide the unlabeled datasets for
generating pseudo-labels. The training dataset is expanded by manual labels and pseudo-
labels. In particular, D-LinkNet was modified to improve the quality of the pseudo-labels.
Results show that, with the same number of labeled samples, our method achieves signifi-
cant improvement in the IoU and F1 score compared to the fully supervised method, and
the precision on some indicators is close to or even better than that of the fully supervised
method of several labeled samples. In addition, by testing in the surrounding areas, the
improvement effect of our method is more obvious, which proves the generalization of
our method and provides a new way of thinking for generalization learning. In future
studies, we will further explore the impact of semi-supervised learning on generalization
learning to improve the effect of labeled samples in the experimental area on the extraction
of building edges in wider regions.
Author Contributions: L.X. and X.Z. designed and conducted the experiments and wrote the first
draft. J.Z., H.Y. and T.C. guided this process and helped with the writing of the paper. All authors
have read and agreed to the published version of the manuscript.
Remote Sens. 2021, 13, 2187 18 of 20
Funding: This work was supported in part by the National Key Research and Development Program
of China under [Grant 2018YFB0505303], in part by the National Natural Science Foundation of
China under [Grant 41701472] and Zhejiang Provincial Natural Science Foundation of China under
Grant No. LQ19D010006.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The paper provides the database used in the current study at baiduyun
(Link: https://pan.baidu.com/s/1953urFhpLZAOEZbyniIAoQ, Extraction code: n7jf) until 2 June
2022 and the python code available online at GitHub (https://github.com/lovexixuegui/SDLED),
accessed on 2 June 2021.
Acknowledgments: We would like to acknowledge the Imagesky in Suzhou for supporting the
Google Earth data.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
References
1. Harirchian, E.; Hosseini, S.E.A.; Jadhav, K.; Kumari, V.; Rasulzade, S.; Işık, E.; Wasif, M.; Lahmer, T. A Review on Application of
Soft Computing Techniques for the Rapid Visual Safety Evaluation and Damage Classification of Existing Buildings. J. Build. Eng.
2021, 102536. [CrossRef]
2. Valentijn, T.; Margutti, J.; van den Homberg, M.; Laaksonen, J. Multi-hazard and spatial transferability of a cnn for automated
building damage assessment. Remote Sens. 2020, 12, 2839. [CrossRef]
3. Bai, Y.; Hu, J.; Su, J.; Liu, X.; Liu, H.; He, X.; Meng, S.; Mas, E.; Koshimura, S. Pyramid Pooling Module-Based Semi-Siamese
Network: A Benchmark Model for Assessing Building Damage from xBD Satellite Imagery Datasets. Remote Sens. 2020, 12, 4055.
[CrossRef]
4. Xu, S.; Noh, H.Y. PhyMDAN: Physics-informed knowledge transfer between buildings for seismic damage diagnosis through
adversarial learning. Mech. Syst. Signal Process. 2021, 151, 107374. [CrossRef]
5. Cerovecki, A.; Gharahjeh, S.; Harirchian, E.; Ilin, D.; Okhotnikova, K.; Kersten, J. Evaluation of Change Detection Techniques
using Very High Resolution Optical Satellite Imagery. Preface 2015, 2, 20.
Remote Sens. 2021, 13, 2187 19 of 20
6. Schlosser, A.D.; Szabó, G.; Bertalan, L.; Varga, Z.; Enyedi, P.; Szabó, S. Building extraction using orthophotos and dense point
cloud derived from visual band aerial imagery based on machine learning and segmentation. Remote Sens. 2020, 12, 2397.
[CrossRef]
7. Reda, K.; Kedzierski, M. Detection, Classification and Boundary Regularization of Buildings in Satellite Imagery Using Faster
Edge Region Convolutional Neural Networks. Remote Sens. 2020, 12, 2240. [CrossRef]
8. Zhang, F.; Du, B.; Zhang, L. Saliency-guided unsupervised feature learning for scene classification. IEEE Trans. Geosci. Remote
Sens. 2014, 53, 2175–2184. [CrossRef]
9. Huang, X.; Cao, Y.; Li, J. An automatic change detection method for monitoring newly constructed building areas using time-series
multi-view high-resolution optical satellite images. Remote Sens. Environ. 2020, 244, 111802. [CrossRef]
10. Liu, C.; Huang, X.; Zhu, Z.; Chen, H.; Tang, X.; Gong, J. Automatic extraction of built-up area from ZY3 multi-view satellite
imagery: Analysis of 45 global cities. Remote Sens. Environ. 2019, 226, 51–73. [CrossRef]
11. Anniballe, R.; Noto, F.; Scalia, T.; Bignami, C.; Stramondo, S.; Chini, M.; Pierdicca, N. Earthquake damage mapping: An overall
assessment of ground surveys and VHR image change detection after L’Aquila 2009 earthquake. Remote Sens. Environ. 2018, 210,
166–178. [CrossRef]
12. Dong, Y.; Zhang, L.; Cui, X.; Ai, H.; Xu, B. Extraction of buildings from multiple-view aerial images using a feature-level-fusion
strategy. Remote Sens. 2018, 10, 1947. [CrossRef]
13. Zhang, X.; Cui, J.; Wang, W.; Lin, C. A study for texture feature extraction of high-resolution satellite images based on a direction
measure and gray level co-occurrence matrix fusion algorithm. Sensors 2017, 17, 1474. [CrossRef]
14. Hossain, M.D.; Chen, D. Segmentation for Object-Based Image Analysis (OBIA): A review of algorithms and challenges from
remote sensing perspective. ISPRS J. Photogramm. Remote Sens. 2019, 150, 115–134. [CrossRef]
15. Hu, Y.; Guo, F. Building Extraction Using Mask Scoring R-CNN Network. In Proceedings of the 3rd International Conference on
Computer Science and Application Engineering, Sanya, China, 22–24 October 2019; pp. 1–5.
16. Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision,
Santiago, Chile, 7–13 December 2015; pp. 1395–1403.
17. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision,
Venice, Italy, 22–29 October 2017; pp. 2961–2969.
18. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
19. Lu, T.; Ming, D.; Lin, X.; Hong, Z.; Bai, X.; Fang, J. Detecting building edges from high spatial resolution remote sensing imagery
using richer convolution features network. Remote Sens. 2018, 10, 1496. [CrossRef]
20. Yuan, J. Learning building extraction in aerial scenes with convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40,
2793–2798. [CrossRef] [PubMed]
21. Fang, B.; Li, Y.; Zhang, H.; Chan, J.C.-W. Semi-supervised deep learning classification for hyperspectral image based on
dual-strategy sample selection. Remote Sens. 2018, 10, 574. [CrossRef]
22. Triguero, I.; García, S.; Herrera, F. Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study.
Knowl. Inf. Syst. 2015, 42, 245–284. [CrossRef]
23. Li, M.; Zhou, Z.-H. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE
Trans. Syst. Man Cybern. Part A Syst. Hum. 2007, 37, 1088–1098. [CrossRef]
24. Wang, Y.; Xu, X.; Zhao, H.; Hua, Z. Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl. Based Syst.
2010, 23, 547–554. [CrossRef]
25. Saigal, P.; Rastogi, R.; Chandra, S. Semi-supervised Weighted Ternary Decision Structure for Multi-category Classification. Neural
Process. Lett. 2020, 52, 1555–1582. [CrossRef]
26. Han, W.; Feng, R.; Wang, L.; Cheng, Y. A semi-supervised generative framework with deep learning features for high-resolution
remote sensing image scene classification. ISPRS J. Photogramm. Remote Sens. 2018, 145, 23–43. [CrossRef]
27. Wu, H.; Prasad, S. Semi-supervised deep learning using pseudo labels for hyperspectral image classification. IEEE Trans. Image
Process. 2017, 27, 1259–1270. [CrossRef] [PubMed]
28. Lee, D.-H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of
the Workshop on Challenges in Representation Learning ICML, Atlanta, GA, USA, 16–21 June 2013.
29. Ji, S.; Wei, S.; Lu, M. A scale robust convolutional neural network for automatic building extraction from aerial and satellite
imagery. Int. J. Remote Sens. 2019, 40, 3308–3322. [CrossRef]
30. Bischke, B.; Helber, P.; Folz, J.; Borth, D.; Dengel, A. Multi-task learning for segmentation of building footprints with deep neural
networks. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September
2019; pp. 1480–1484.
31. Liu, Y.; Cheng, M.-M.; Hu, X.; Wang, K.; Bai, X. Richer convolutional features for edge detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3000–3009.
32. Hu, Y.; Chen, Y.; Li, X.; Feng, J. Dynamic feature fusion for semantic edge detection. arXiv 2019, arXiv:1902.09104.
33. He, J.; Zhang, S.; Yang, M.; Shan, Y.; Huang, T. Bi-directional cascade network for perceptual edge detection. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3828–3837.
Remote Sens. 2021, 13, 2187 20 of 20
34. Li, Y.; Chen, J.; Xie, X.; Ma, K.; Zheng, Y. Self-Loop Uncertainty: A Novel Pseudo-Label for Semi-supervised Medical Image
Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention,
Lima, Peru, 4–8 October 2020; pp. 614–623.
35. Zhou, L.; Zhang, C.; Wu, M. D-linknet: Linknet with pretrained encoder and dilated convolution for high resolution satellite
imagery road extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt
Lake City, UT, USA, 18–22 June 2018; pp. 182–186.
36. Xia, L.; Zhang, X.; Zhang, J.; Wu, W.; Gao, X. Refined extraction of buildings with the semantic edge-assisted approach from very
high-resolution remotely sensed imagery. Int. J. Remote Sens. 2020, 41, 8352–8365. [CrossRef]