Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images

Huo, Lina and Hou, Jiayue and Feng, Jie and Wang, Wei and Liu, Jinsheng (2024) Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images. Remote Sensing, 16 (4). p. 624. ISSN 2072-4292

[thumbnail of remotesensing-16-00624.pdf] Text
remotesensing-16-00624.pdf - Published Version

Download (2MB)

Abstract

Salient Object Detection (SOD) is gradually applied in natural scene images. However, due to the apparent differences between optical remote sensing images and natural scene images, directly applying the SOD of natural scene images to optical remote sensing images has limited performance in global context information. Therefore, salient object detection in optical remote sensing images (ORSI-SOD) is challenging. Optical remote sensing images usually have large-scale variations. However, the vast majority of networks are based on Convolutional Neural Network (CNN) backbone networks such as VGG and ResNet, which can only extract local features. To address this problem, we designed a new model that employs a transformer-based backbone network capable of extracting global information and remote dependencies. A new framework is proposed for this question, named Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images (GMANet). In this framework, the Pyramid Vision Transformer (PVT) is an encoder to catch remote dependencies. A Multiscale Attention Module (MAM) is introduced for extracting multiscale information. Meanwhile, a Global Guiled Brach (GGB) is used to learn the global context information and obtain the complete structure. Four MAMs are densely connected to this GGB. The Aggregate Refinement Module (ARM) is used to enrich the details of edge and low-level features. The ARM fuses global context information and encoder multilevel features to complement the details while the structure is complete. Extensive experiments on two public datasets show that our proposed framework GMANet outperforms 28 state-of-the-art methods on six evaluation metrics, especially E-measure and F-measure. It is because we apply a coarse-to-fine strategy to merge global context information and multiscale information.

Item Type: Article
Subjects: European Repository > Multidisciplinary
Depositing User: Managing Editor
Date Deposited: 08 Feb 2024 09:25
Last Modified: 08 Feb 2024 09:25
URI: http://go7publish.com/id/eprint/4128

Actions (login required)

View Item
View Item