Skip to main content

Crop insect pest detection based on dilated multi-scale attention U-Net



Crop pests seriously affect the yield and quality of crops. Accurately and rapidly detecting and segmenting insect pests in crop leaves is a premise for effectively controlling insect pests.


Aiming at the detection problem of irregular multi-scale insect pests in the field, a dilated multi-scale attention U-Net (DMSAU-Net) model is constructed for crop insect pest detection. In its encoder, dilated Inception is designed to replace the convolution layer in U-Net to extract the multi-scale features of insect pest images. An attention module is added to its decoder to focus on the edge of the insect pest image.


The experiments on the crop insect pest image IP102 dataset are implemented, and achieved the detection accuracy of 92.16% and IoU of 91.2%, which is 3.3% and 1.5% higher than that of MSR-RCNN, respectively.


The results indicate that the proposed method is effective as a new insect pest detection method. The dilated Inception can improve the accuracy of the model, and the attention module can reduce the noise generated by upsampling and accelerate model convergence. It can be concluded that the proposed method can be applied to practical crop insect pest monitoring system.


Crop pests are a major agricultural problem in the world, which seriously affect the yield and quality of crops. Crop insect pest detection is the premise and foundation of crop insect pest identification and control [1, 2]. There are many crop insect pest detection methods [3, 4]. They are broadly divided into two categories: traditional machine learning (ML) [5] and deep learning (DL) [6,7,8]. YOLO, U-Net and their variants have been widely applied to crop insect pest detection task, and achieved remarkable results [9,10,11]. Galphat et al. [12] comprehensively reviewed and analyzed the algorithms and technologies of insect pest detection in the agricultural field. Domingues et al. [13] presented a literature review on ML and DL used in the agricultural sector, focusing on the tasks of classification, detection, and prediction of diseases and pests, with an emphasis on tomato crops. Liu et al. [14] summarized the research on plant pest detection based on DL in recent years from three aspects: classification network, detection network and segmentation network, introduced the advantages and disadvantages of each method, and discussed the challenges that DL-based crop pest detection may face in practical applications. One of the most essential and beneficial properties of DL is its ability to generate features autonomously [15].

As shown in Fig. 1, the insect pest images taken in the field are diverse and irregular with different scales, shapes, poses, positions, illumination and complex backgrounds. Therefore, the existing crop insect pest detection methods are faced with some challenge, such as complex environment, detection of tiny size pest of multiple classes of pests, and the traditional ML algorithm is difficult to extract the invariant detection features, while the significant results of DL method rely on a large number of images and powerful computing power.

Fig. 1
figure 1

Insect pests in the field with various shapes, poses, sizes, colors, illumination and background

As for the various insect pests with deferent sizes, multi-branch, multi-channel and multi-scale DL models have been presented [16]. But these models have large trainable parameters, costing large training-time and computing ability. Dilated convolution can enlarge the receptive filed and improve the object detection ability of the network [17]. Several modified U-Nets did not consider the influence of U-Net combined with dilated convolution module on the result of feature extraction. Dilated ResNet can improve fine detection results [18]. Inception module can capture multi-scale context features by using multiple convolutional kernels of different sizes. It can not only capture long time–frequency context information of features, but also exploit information from multiple layers of CNN. Attention mechanism can help the network model locate the focus area, extract more useful features, and achieve high precision fusion. Attention based CNN has higher classification accuracy and significantly reduces the number of particles misclassified, which reflects the focusing effect of attention mechanism [19, 20].

As shown in Fig. 1, crop pest detection has some challenging issues that affect the accuracy of insect pest detection methods, such as multi-class, multi-scale, tiny size of pest objects, unbalanced data for multi-class, and sparse pest distribution. To improve the detection accuracy of field insect pests, an improved U-Net, namely the expanded multi-scale Attention U-Net (DMSAUN-Net), is constructed by using the advantages of ResNet, dilated convolution and Inception module. The main contributions of this paper are summarized as follows:

  1. (1)

    Dilated Inception module with various dilation ratios is introduced to extract the multi-scale contextual features.

  2. (2)

    Spatial attention mechanism is added to the skip connection layers of U-Net, which can focus the attention on the edge of the insect pest and reduce the noise and computational cost.

  3. (3)

    The computational cost is further reduced by introducing ResNet into the skip connection layer of U-Net.

The rest of this paper is organized as follows. The related works are summarized in "Related works". The proposed DMSAUN-Net based insect pest detection is illustrated in "Dilated Multi-scale attention U-Net (DMSAU-Net)". The detail experimental analysis and comparison is provided in "Experiments and analysis", and "Conclusions" summarizes this paper and points out the future work.

Related works

In Section, U-Net, Inception module and Dilated convolution are briefly described. Inception module and Dilated convolution are widely introduced into DL model to improve its multi-scale objection detection performance.


U-Net consists of encoding part, decoding part and skip connection without fully-connected layers. Its architecture is shown in Fig. 2. Encoding part is to extract high-resolution and contextual features with downsampling and one activation unit (ReLU) at each layer. Decoding part is to increase the resolution of the output through upsampling at each layer. Skip connection is used to fuse the features from the encoding part with the corresponding feature map of the encoding part, ensuring localization of the extracted contextual features. In the down-sampling process, the number of feature channels is doubled while is shrunk after deconvolution operation in up-sampling.

Fig. 2
figure 2

U-Net architecture

Inception module

Inception is a multi-branch multi-scale convolution module [21]. It can extract the multi-scale features from the input image by different-scale kernels. Its structure is shown in Fig. 3, including 1 × 1, 3 × 3 and 5 × 5 convolutional kernels, where 1 × 1 convolution operation is used to reduce the amount of calculation. To make the feature map have the same size, each branch adopts the same padding mode, and the stride is 1.

Fig. 3
figure 3

The structure of inception

Dilated convolution

Dilated convolution can increase the receptive field without increasing the model parameters, which can reduce the computation amount and retain the nodal information [17, 18]. Its structure is shown in Fig. 4 with 4 dilated rates. It is seen from Fig. 4, the size of receptive field increases with the dilated rate, but the network parameters do not increase, that is 9 parameters.

Fig. 4
figure 4

Dilated convolution with 4 dilation rates. A rate = 1. B Rate = 2. C Rate = 3. D Rate = 4

Suppose an image G of size m × n and a convolutional kernel W of size k × k. The classical convolution between G and W is calculated by:

$$(G*W)(p) = \sum\limits_{i} {G[p + i] \cdot W[i]} .$$

Given a dilation rate r, dilated convolution (r) is defined as:

$$(G*_{r} W)(p) = \sum\limits_{i} {G[p + ri] \cdot W[i]} .$$

From Eqs. (1) and (2), it is evident that dilated convolution is simple convolution when r = 1. For r > 1, − 1 zeroes are inserted between each kernel element, creating a \(k_{s} \times k_{s}\) scaled and sparse filter, where \(k_{s}\) is \(k_{s} = k + (k - 1)(r - 1)\). The dilated rate r increases the receptive field of kernel by a factor \((k_{s} /k)^{2}\).

Dilated multi-scale attention U-Net (DMSAU-Net)

Due to the small dataset and the easy influence of complex background such as illumination and clutter, as shown in Fig. 1, the detection accuracy of crop insect pest is low, which is over-detection or under- detection. In this Section, an improved U-Net model namely dilated multi-scale attention U-Net (DMSAU-Net) is constructed for insect pest image detection. Its overall structure is shown in Fig. 5. The numbers shown below each dilated Inception module indicate the total number of kernels used, height, width and depth of the output feature maps.

Fig. 5
figure 5

The structure of DMSAU-Net

Detail of DMSAU-Net

Similar to U-Net, DMSAU-Net mainly consists of the encoding part, decoding part and skip connection with attention mechanism. Encoding part is a multi-scale convolutional network, including dilated Inception module (convolution kernel of 3 × 3 kernel), the pooling layer (2 × 2 maximum pooling), and the activation function (ReLU). In encoding part, the multi-scale and multi-level features are extracted from images through dilated Inception module, and then the extracted features are downsampled to realize the correlation of multiple channels and the full decoupling of image feature space. In decoding part, the extracted features are restored by up-sampling (deconvolution of 2 × 2), skip connection with attention, dilated Inception module (convolution kernel of 3 × 3) and activation function (ReLU) classification. Finally, the binary insect pest image of the insect pest and the background is obtained by 1 × 1 convolution layer and Sigmoid activation function. Considering the possible mesh effect caused by deconvolution, the bilinear interpolation method in upsampling is used to restore the image, during which 1 × 1 convolution is used to restore the number of channels. During the up-sampling process, the feature maps corresponding to the same resolution of down-sampling are concatenated. After each concatenation, the feature maps are further refined through dilated multi-scale module, and the upsampling is performed successively until the features extracted from the encoder are restored to the size of the input maps. Skip connection with attention is to concatenate the convolutional features and the deconvolution features.

Dilated inception module

Inspired by Inception module, multi-scale concatenation module and dilated concatenation module, a dilated Inception module is constructed as shown in Fig. 6, consisting of 3 1 × 1 convolution kernels, 3 dilated convolutional kernels, a concatenation, and residual connection. It is a modified Inception module, which aims to extract multi-scale features, from low-level structural features to high-level semantic features, by increasing receptive field without increasing the training parameters.

Fig. 6
figure 6

The structure of dilated inception module

Given an input Gor, from Fig. 6, the dilated Inception module can generate multi-scale features denoted as Gi (1 ≤ i ≤ 3) by employing three convolutions to collect contextual data at various scales. The convolution layer with kernel size of 1 × 1 is utilized to reduce the calculation cost. Since the multi-scale features of the 3 channels Gi are independent of each other, the transmission of global contextual features is limited. To address this problem, the global average pooling layer is used to provide richer contextual features Ggap:

$$G_{{{\text{gap}}}} = Up({\text{conv }}(gap(F))),$$

where Up is a bilinear interpolation operation to up-sample contextual features to the same size as Gi, gap is the global average pooling, and conv is a 1 × 1 convolutional operation.

The multi-scale features Gi, Ggap and Gor are combined to obtain the output feature G:

$$G = {\text{ conv }}\left( {G_{{1}} + G_{{2}} + G_{{3}} + G_{{{\text{gap}}}} + G_{or} } \right) ,$$

where the convolution process 1 × 1 is calculated by conv and the concatenation by ‘+’.

Attention module

Dilated Inception module obtains multi-scale features by encoding part and decoding part, and the extracted multi-scale features are concatenated by skip connection to achieve more accurate details and location information of insect pest. However, the max-pooling and upsampling operation in the encoding and decoding parts will lose part of the location space and other information, resulting in inaccurate segmentation of insect pest in the field. To overcome this problem, the spatial attention mechanism module is added to Skip connection. By directly cascading the features of the encoding layer and the corresponding deconvolution layer, the attention module fuses their complementary features, suppressed the noise generated by upsampling, and enhanced the robustness of the model [17]. The structure of attention module is shown in Fig. 7.

Fig. 7
figure 7

The structure of attention module

In attention module in Fig. 7, the input feature map GRH×W×(C+2), is the output of dilated Inception module in Fig. 6, is fed into the global maximum pooling (GMP) layer and global average pooling (GAP) layer, respectively, obtain GgmpR1×1×(C+2) and GgapR1×1×(C+2), are fed into the two fully connection layers Fc1 and Fc2 to get the output Gmax and Gavg in each branch. The number of parameters is decreased since the parameters of Fc1 and Fc2 are shared by two channels. The process is a detailed calculation:

$$G_{\max } = {\text{ReLU}}\left( {Fc_{{1}} \left( {G_{{{\text{gmp}}}} } \right)} \right),\; G_{avg} = {\text{ReLU}}\left( {Fc_{{1}} \left( {G_{{{\text{gap}}}} } \right)} \right).$$

Then the attention weight Sa by sigmoid activation function is generated as follows:

$$S_{a} = {\text{ Sigmoid}}(G_{\max } + G_{avg} ).$$

To obtain the attention map Gatt, the multiplication operation of the features G and the attention weight Sa is required. The attention map Gatt is calculated as follow:

$$G_{{a{\text{tt}}}} = G \times S_{a} .$$

Loss function

The weighted IoU loss function LIoU and the weighted binary cross entropy loss function Lbce are used to construct the loss function Loss in DMSAU-Net, calculated as follows:

$$Loss = L_{IoU} + L_{bce} .$$

To achieve effectively crop insect pest detection, the two loss functions are used to represent the global and local supervision losses, respectively.

Model training and evaluation

The input insect pest images and their corresponding labeled images are used to train DMSAU-Net. Five-fold cross-validation (fivefold CV) scheme and stochastic gradient descent (SGD) with an adaptive moment estimator (Adam) are often used to train all models [22].

The purpose of crop insect pest detection is to determine the category of each pixel in the image, so as to clarify the scope of insect pests. Accuracy and Intersection over Union (IoU) are selected as indexes to evaluate the segmentation performance of the proposed algorithm, calculated as follows:

$$ACC = \frac{TP}{{TP + FN}},\;\;\;IoU = \frac{TP}{{TP + FP + FN}},$$

where TP, FP and FN are the numbers of true positives, false positives, and false negatives of the class, respectively.

Experiments and analysis

To validate the proposed DMSAU-Net based insect pest detection method, the insect pest detection experiments are conducted on the crop common insect pest image dataset, compared with two insect detection approaches: multi-scale super-resolution feature enhancement module (MSR-RCNN) [6], multi-projection pest detection model (MDM) [7], and compared with U-Net [23] and its two improved models: U-Net with dilated convolution (DCU-Net) [24], ResNet with U-Net (ResU-Net) [25]. Batch size = 32 for rice data subset of IP102 to reduce computation time. Number of iterations = 3000, global learning rate = 0.001, gradient decay factor = 0.9, squared gradient decay factor = 0.999, loss function = cross entropy. All models are tested on Keras and trained on an Intel Xeon E5-2643v3 @3.40 GHz CPU, GTX2080Ti 11 GB GPU, 64 GB RAM, Windows 7 64bit, CUDA Toolkit10.0, CUDNN V7.6.5, Python 3.7 and Tensorflow-GPU 1.8.0.

insect pest image dataset

IP102 ( is a public insect pest image dataset, containing more than 75,000 images belonging to 102 insect pest categories that exhibit a natural long-tailed distribution [26]. 19,000 of these images have be professionally annotated. There are 8415 insect pest images in the dataset belonging to 14 rice insect pest categories, as shown in Fig. 8 and Table 1. From Table 1, it is seen that the classes of rice insect pests are highly unbalanced, ranging in sample size between 173 and 1115. In the experiments on this data subset, five-fold-cross validation scheme is adopted to perform experiments. That is, the dataset is randomly split into 5 mutually exclusive subsets of equal or near equal size. The model is performed 5 times subsequently, where each time using 4 of the 5 splits as the training set to train the model, and the 1 of the 5 splits as the test set to evaluate the performance of the model. To verify the robustness of the proposed method, images under different conditions such as strong illumination and complex background are selected, and the same insect pest in the dataset contained different insect states. To improve the recognition accuracy, Photoshop is used to uniformly adjust the image to 128 × 128 pixels.

Fig. 8
figure 8

Fourteen original insect pest images, one per species

Table 1 Rice insect pest image dataset

Experimental results

To test the effectiveness of spatial attention module, Fig. 9 show the convolutional feature maps of the first dilated multi-scale module and the corresponding feature maps after spatial attention module. From Fig. 9, it is obvious that the convolutional feature maps after spatial attention module are more significant than that of the first dilated multi-scale module.

Fig. 9
figure 9

Convolutional feature maps

DMSAU-Net is a modified U-Net, is similar to DCU-Net and ResU-Net in structure. So, we compare the performances of U-Net, DMSAU-Net, DCU-Net and ResU-Net. Figure 10A shows the loss values versus the number of iterations of U-Net and DMSAU-Net on the training set. From Fig. 10A, it is found that DMSAU-Net converges much better than U-Net.

Fig. 10
figure 10

Loss versus iterations

To further analyze the training performance, the same experimental parameters are used to train the different networks to ensure the reliability of the comparison results. The detection performances of DMSAU-Net, DCU-Net and ResU-Net are shown in Fig. 10B. From Fig. 10B, it can be seen that as the number of iterations increases, the detection loss of the three models decreases, and the detection performance of DMSAU-Net outperformances other models. Before 1000 iterations, their losses decrease quickly, go down very slowly and then level off. From Fig. 10, it can be seen that their training processes are relatively stable after 2000 iterations, DCU-Net and DMSAU-Net are better than ResU-Net, and DMSAU-Net is better than DCU-Net. This owes to the dilated multi-scale convolution and spatial attention mechanism.

To test the detection performance of DMSAU-Net, we randomly select 4 rice insect pest images from the IP102 dataset and visualize the detection results of DMSAU-Net and 4 comparative approaches, as shown in Fig. 11. To reflect the advantages of deep learning methods, we compare DMSAU-Net with a traditional insect pest detection algorithm Fuzzy C-means (FCM). It is one of the most widely used methods of insect pest detection [27]. The detection results are also in Fig. 11. As can be seen from Fig. 11, 5 improved U-Net models are much better than FCM. They can effectively detect rice insect pests under complex background, and the position and shape of insect pests are good, among which, MSR-RCNN is better than the other comparative models, DCU-Net is better than MDM and ResU-Net. Overall, DMSAUN-Net has the best detection effect with more accurate insect pest shape and edge.

Fig. 11
figure 11

Rice insect pest detection of IP102

To quantitatively estimate the detection performance of DMSAU-Net, fivefold cross validation experiments are conducted on the rice insect pest image subset of IP102. As can be seen from Fig. 10, all models basically converge at the 3000th iteration. For fair, the trained models are selected at the 3000th iteration. The detection results of five comparative methods and DMSAU-Net are listed in Table 2.

Table 2 The detection results of rice insect pests by 4 insect pest detection algorithms

DMSAU-Net is constructed by making use of the advantage of U-Net, dilated convolution, multi-scale convolution and spatial attention mechanism. To further verify the superiority of DMSAU-Net, some ablation experiments are carried out under similar conditions. The different experimental set and rice insect pest results are shown in Table 3, where attention is added to skip connection of U-Net.

Table 3 The different experimental set and results

Table 3 indicates that spatial attention, channel-spatial attention, Inception, dilated multi-scale can contribute to the results to some extent.


From Figs.10 and 11 and Tables 2 and 3, it can be seen that DMSAU-Net has best detection performance, the highest detection rate and the least training and detecting time, due to dilated convolution, multi-scale convolution and spatial attention mechanism. With the aid of dilated multi-scale, DMSAU-Net can extract the multi-scale classification features. With the aid of skip connection combined with spatial attention mechanism, DMSAU-Net can enhance the constraint on the feature maps, focus more attention to the insect pest image region, and speed up the training. That is to reduce the learning of non-important areas and enhance the learning of insect pest areas, so as to improve the detection ability of insect pest characteristics and improve the detection accuracy rate. The detection rate of U-Net is the poorest, because it is difficult to extract the robust classification features from the various insect pest images with very complex background.


In modern agricultural field, insect pest detection plays an important role in timely and accurate diagnosis of crop insect pest. But it is difficult to detect crop insect pest in the field due to the various-shape-size insect pests with complex background. To solve such problem, a dilated multi-scale attention U-Net (DMSAU-Net) model is constructed for crop insect pest detection by making use of the advantage of multi-scale convolution and attention mechanism. In dilated inception module, multi-scale convolution kernels without increasing training parameters are used to extract the distributed characteristics of insect pests at different scales and to perform cascade fusion. The experiments are carried out on the rice insect pest image subset of IP102 dataset. The detection results show that DMSAU-Net is effective and feasible for crop insect pest detection in the field. This research can be used to realize the automation degree of insect pest management in agricultural field. Future work is to optimize the model to organically integrate it into an effective insect pest detection system.

DMSAU-Net is complex in structure and has many data processing processes, and is difficult to detect the occlusion insect pest in field, which is a common phenomenon. In the future, we will prune and optimize the model, continue to study how to improve the accuracy of insect pest detection and the generalization ability of the model, and construct a multi-scale feature fusion DMSAU-Net to deal with the occlusion problem.

Availability of data and materials

The data that support the findings of this study are openly available in


  1. Dattatraya VS, Sudhir KS, Ghanshyam C, et al. Low cost sensor based embedded system for plant protection and pest control. Int Conf Soft Comput Tech Implement. 2015.

    Article  Google Scholar 

  2. Bhujel A, Mahonar S, Choubey M, et al. Pest and diseases management in Darjeeling Tea. Soc Sci Electron Publ. 2016;6(3):469–72.

    Google Scholar 

  3. Li W, Zheng T, Yang Z, et al. Classification and detection of insects from field images using deep learning for smart pest management: a systematic review. Eco Inform. 2021;66: 101460.

    Article  Google Scholar 

  4. Ye J, Yu Z, Wang Y, et al. PlantBiCNet: a new paradigm in plant science with bi-directional cascade neural network for detection and counting. Eng Appl Artif Intell. 2024;130: 107704.

    Article  Google Scholar 

  5. Kasinathan T, Singaraju D, Srinivasulu R. Insect classification and detection in field crops using modern machine learning techniques. Informat Process Agric. 2021;8(3):446–57.

    Article  Google Scholar 

  6. Teng Y, Zhang J, Dong S, et al. MSR-RCNN: a multi-class crop pest detection network based on a multi-scale super-resolution feature enhancement module. Frontiers. 2022;13: 810546.

    Article  Google Scholar 

  7. Wang F, Wang R, Xie C, et al. Fusing multi-scale context-aware feature representation for automatic in-field pest detection and recognition. Comput Electron Agric. 2020;169: 105222.

    Article  Google Scholar 

  8. Yu Z, Ye J, Li C, et al. TasselLFANet: a novel lightweight multi-branch feature aggregation neural network for high-throughput image-based maize tassels detection and counting. Front Plant Sci. 2023;14:1158940.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Xiang S, Wang S, Xu M, et al. YOLO POD: a fast and accurate multi-task model for dense Soybean Pod counting. Plant Methods. 2023;19:8.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wen C, Chen H, Ma Z, et al. Pest-YOLO: a model for large-scale multi-class dense and tiny pest detection and counting. Front Plant Sci. 2022;13: 973985.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Agarwal M, Gupta SK, Biswas KK. Plant leaf disease segmentation using compressed UNet architecture. Cham: Springer; 2021. p. 9–14.

    Book  Google Scholar 

  12. Galphat Y, Patange VR, Talreja P, et al. Survey and analysis of pest detection in agricultural field. In: International Conference on Computer Networks, Big Data and IOT. Cham: Springer; 2018. p. 976–83.

    Chapter  Google Scholar 

  13. Domingues T, Brandão T, Ferreira J. Machine learning for detection and prediction of crop diseases and pests: a comprehensive survey. Agriculture. 2022;12(9):1350.

    Article  Google Scholar 

  14. Liu J, Wang X. Plant diseases and pests detection based on deep learning: a review. Plant Methods. 2021;17:22.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Rikiya Y, Mizuho N, Gian D, et al. Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018;9:611.

    Article  Google Scholar 

  16. Jia Z, Lin Y, Wang J, et al. MMCNN: a multi-branch multi-scale convolutional neural network for motor imagery classification. Cham: Springer; 2021. p. 736–51.

    Book  Google Scholar 

  17. Shi B, Su Y, Lian C, et al. Obstacle type recognition in visual images via dilated convolutional neural network for unmanned surface vehicles. J Navig. 2022;75(2):437–54.

    Article  Google Scholar 

  18. Wu Y, Guo C, Gao H, et al. Dilated residual networks with multi-level attention for speaker verification. Neurocomputing. 2020;412(4):177–86.

    Article  Google Scholar 

  19. Wang H, Yang J, Wang R, et al. Remaining useful life prediction of bearings based on convolution attention mechanism and temporal convolution network. IEEE Access. 2023;11:24407–19.

    Article  Google Scholar 

  20. Wei H, Zhang Q, Gu Y. Remaining useful life prediction of bearings based on self-attention mechanism, multi-scale dilated causal convolution, and temporal convolution network. Measure Sci Technol. 2023;34(4): 045107.

    Article  ADS  Google Scholar 

  21. Gao W, Yu L, Tan Y, et al. MSIMCNN: multi-scale inception module convolutional neural network for multi-focus image fusion. Appl Intell. 2022;52:14085–100.

    Article  Google Scholar 

  22. Liu M, Yao D, Liu Z, et al. An improved Adam optimization algorithm combining adaptive coefficients and composite gradients based on randomized block coordinate descent. Comput Intell Neurosci. 2023;10:4765891.

    Article  Google Scholar 

  23. Sivagami S, Chitra P, Kailash G, et al. U-Net architecture based dental panoramic image segmentation. In: International Conference on Wireless Communications Signal Processing and Networking, 2020; pp. 187–191.

  24. Yang T, Zhou Y, Li L, et al. DCU-Net: multi-scale U-Net for brain tumor segmentation. J Xray Sci Technol. 2020;28(4):709–26.

    PubMed  Google Scholar 

  25. Najeeb RS, Dahl IO. Brain tumor segmentation utilizing generative adversarial, resnet and U-net deep learning. In: 8th International Conference on Contemporary Information Technology and Mathematics, Mosul, Iraq, 2022; pp. 85–89.

  26. Wu X, Zhan C, Lai Y K, et al. IP102: a large-scale benchmark dataset for insect pest recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019; pp. 8779–8788.

  27. Pratheba R, Sivasangari A, Saraswady D. Performance analysis of pest detection for agricultural field using clustering techniques. In: International Conference on Circuits, Power and Computing Technologies. 2014; 14968106.

Download references


This study was supported in part by grants from the National Natural Science Foundation of China (Nos. 62172338) and 2022 General Special Research Plan Project of Shaanxi Provincial Department of Education (No.: 22JK0596).

Author information

Authors and Affiliations



Xuqi Wang and Shanwen Zhang wrote the main manuscript text,Wang Xuqi conducted experimental design and data analysis for the paper,all authors revisewed the manuscipt.

Corresponding author

Correspondence to Shanwen Zhang.

Ethics declarations

Ethics approval and consent to participate

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Competing interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Zhang, S. & Zhang, T. Crop insect pest detection based on dilated multi-scale attention U-Net. Plant Methods 20, 34 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: