Skip to main content

Pyramid-YOLOv8: a detection algorithm for precise detection of rice leaf blast

Abstract

Rice blast is the primary disease affecting rice yield and quality, and its effective detection is essential to ensure rice yield and promote sustainable agricultural production. To address traditional disease detection methods’ time-consuming and inefficient nature, we proposed a method called Pyramid-YOLOv8 for rapid and accurate rice leaf blast disease detection in this study. The algorithm is built on the YOLOv8x network framework and features a multi-attention feature fusion network structure. This structure enhances the original feature pyramid structure and works with an additional detection head for improved performance. Additionally, this study designs a lightweight C2F-Pyramid module to enhance the model’s computational efficiency. In the comparison experiments, Pyramid-YOLOv8 shows excellent performance with a mean Average Precision (mAP) of 84.3%, which is an improvement of 9.9%, 4.3%, 7.4%, 6.1%, 1.5%, 3.7%, and 8.2% compared to the models Faster-RCNN, RT-DETR, YOLOv3-SPP, YOLOv5x, YOLOv9e, and YOLOv10x, respectively. Additionally, it reaches a detection speed of 62.5 FPS; the model comprises only 42.0 M parameters. Meanwhile, the model size and Floating Point Operations (FLOPs) are reduced by 41.7% and 23.8%, respectively. These results demonstrate the high efficiency of Pyramid-YOLOv8 in detecting rice leaf blast. In summary, the Pyramid-YOLOv8 algorithm developed in this study offers a robust theoretical foundation for rice disease detection and introduces a new perspective on disease management and prevention strategies in agricultural production.

Introduction

Rice, one of the three major global food crops, is the primary food source for nearly half of the world’s population [1]. In China, rice is a primary food crop, accounting for nearly half of the total grain production [2]. However, rice is vulnerable to various diseases during its growth, which can significantly threaten grain yield. Among these, rice blast is one of the most critical diseases affecting rice growth. According to statistics, from 2010 to 2020, the average area affected by rice blast disease in China’s five major rice-producing regions exceeded 3 million hectares annually, resulting in the loss of hundreds of thousands of tons of paddy rice each year [3]. Generally, rice blast leads to a 10–20% reduction in yield. In severe cases, it can decrease the yield by 40–50% or even result in total crop failure. It can be categorized into seedling blast, leaf blast, and neck blast, according to the location of the disease manifestation. Among these, leaf blast is responsible for the most significant losses [4, 5]. The disease occurs during the rice’s nutritive growth period and causes characteristic fusiform lesions on the leaves and necrotic lesions at the leaf collar [6]. If left unchecked, it not only severely impacts the yield and quality of rice but may also threaten food and economic security in China and potentially the world.

Traditional rice disease detection relies on field surveys conducted by plant protection experts [7], which are both costly and subjective. While Polymerase Chain Reaction (PCR)-based detection of leaf blast offers high accuracy, it requires specialized equipment and trained personnel, which limits its practical applicability [8]. These limitations render the methods above unsuitable for large-scale application, leading growers to resort to the extensive use of fungicides during the early stages of rice growth for disease control. However, using ecologically harmful fungicides can result in pesticide residues, leading to environmental issues. Consequently, there is an urgent need for an efficient and rapid detection method to facilitate timely diagnosis and precise control of leaf blast, ensuring a sustainable agricultural production process.

In recent years, machine learning techniques have made remarkable developments in rice disease detection. Prajapati et al. [9] extracted color, shape, and texture features from rice disease images using the center of mass-based K-means clustering and identified three types of rice diseases (bacterial blight, brown spot, and leaf smut) using Support Vector Machine (SVM), which ultimately obtained 83.80% and 88.57% using 5-fold versus 10-fold cross-validation accuracy, respectively. Chung et al. [10] utilized digital image processing combined with machine learning for the non-destructive detection of rice bakanae disease. The accuracy of the proposed method, which distinguished between healthy and infected seedlings, was 87.9% by selecting the basic features and optimal model parameters of the SVM classifier using a genetic algorithm. Subsequently, Ghyar and Birajdar [11] extracted 21-dimensional feature vectors from rice leaf disease images using gray level co-occurrence matrix (GLCM) and color moments analysis techniques. They reduced redundant features to a 14-dimensional vector using a genetic algorithm and utilized an SVM classifier, achieving 92.5% accuracy in rice disease classification. Although these methods have made some progress in rice disease detection research, manual feature extraction is both time-consuming and prone to overlook important features, leading to a decrease in detection accuracy. In addition, there are still limitations when dealing with large-scale data.

With the rapid development of deep learning, many advanced detection models have been widely applied in plant disease detection [12, 13, 14]. Lu et al. [15] developed a system based on convolutional neural networks to recognize ten common rice diseases under the 10-fold cross-validation strategy, achieving 95.48% accuracy. Zhou et al. [16] proposed a rice disease detection method combining FCM-KM and Faster R-CNN, achieving detection accuracies of 96.71% for rice blast, 97.53% for bacterial blight, and 98.26% for sheath blight from a dataset of 3010 images. Despite these methods’ high accuracies, the slow detection speed of two-stage algorithms makes real-time detection challenging in large-scale growing areas [17, 18]. In contrast, single-stage detectors like the YOLO series significantly improved detection speed while maintaining a certain level of detection accuracy, which makes them more suitable for large-scale applications. Kiratiratanapruk et al. [19] used 6330 images of rice diseases taken under natural conditions and found YOLOv3 to be the most effective, achieving a mAP of 79.19%. Deep learning-based models are also applicable in diagnosing diseases in other crops. Li et al. [20] developed a model named DAC-YOLOv4 for diagnosing strawberry powdery mildew, incorporating depthwise separable convolution into the backbone and embedding the CBAM module in the neck, achieving a mAP of 72.7% and speeds of 43 FPS on Jetson Xavier NX and 20 FPS on Jetson Nano. Khan et al. [21] designed a two-stage apple disease recognition system: the first stage classifies apples as diseased or healthy, and the second detects specific diseases. Despite Faster R-CNN achieving the highest detection accuracy with a mAP of 42.01%, YOLOv4 showed better overall performance with a mAP of 41.1% and a top speed of 47 FPS. However, it was less effective in detecting small target diseases like apple scab, with Faster R-CNN only achieving a mAP of 25.9% for this disease. To improve the detection accuracy of small targets, Li et al. [22] modified the YOLOv5 model by integrating the transformer encoder, resulting in a 9.8% increase in accuracy on a self-constructed cucumber disease dataset. However, the lack of diversity in their constructed dataset could affect the model’s detection accuracy in practical applications. Zhang et al. [23] proposed an improved Yolov5-ECA-ASFF target detection algorithm to address the issues of small target size and low localization accuracy in images of wheat scab fungus spores. This algorithm incorporates the ECA attention mechanism and the Adaptive Spatial Feature Fusion (ASFF) mechanism into the feature pyramid structure of YOLOv5, achieving a 6.8% improvement in mAP compared to the original YOLOv5s model. The above analysis indicates that deep learning-based detection models have significantly progressed in crop disease detection. For small target disease detection, deeply mining disease features by introducing more complex network structures can improve the model’s detection accuracy to a certain extent but also sacrifice the detection speed of the model and increase the number of parameters and computation of the model. Meanwhile, detection techniques still face many challenges for some crop diseases with unique physiological characteristics, such as rice leaf blast, due to their small spot sizes and unclear boundaries.

In response to the above challenges, this study proposes an improved Pyramid-YOLOv8 algorithm. The algorithm improves the feature pyramid structure and designs a multi-attention feature fusion structure based on an additional detection head, which can effectively extract the feature information of small targets. At the same time, we designed a lightweight C2F-Pyramid module for feature extraction, enabling rapid leaf blast detection while maintaining accuracy.

The main contributions of this study are as follows:

  1. (1)

    An additional detection head for small targets in the improved Pyramid-YOLOv8 algorithm plays a crucial role. This extra detection head utilizes higher-resolution feature maps to predict small targets accurately, effectively addressing the problem of information loss about small targets during the feature transfer process.

  2. (2)

    The convolutional block attention mechanism (CBAM) was employed to facilitate multi-attention feature fusion. In the improved Pyramid-YOLOv8 algorithm, the CBAM module is reasonably positioned at the junction between the model’s backbone and neck. This placement allows for the extraction of spatial and channel information from features at various scales, thereby significantly improving the efficiency of feature fusion.

  3. (3)

    The lightweight C2F-Pyramid module is vital in the feature extraction within the Pyramid-YOLOv8 algorithm. It ensures efficient information flow between channels, and its unique structure eliminates unnecessary and redundant computations, which can efficiently extract features and significantly reduce the model’s parameters and computational demands.

The rest of this paper is organized in that Sect. 2 introduces the image pre-processing methods, including data collection, augmentation, and partitioning, and the improved Pyramid-YOLOv8 algorithm, which incorporates a lightweight C2F-Pyramid module combined with the CBAM attention mechanism and builds a specialized detection head for small-scale object detection. Section 3 presents related experiments, discusses the experimental results, and proposes directions for future work. Section 4 summarizes the research findings of this study.

Materials and methods

Image datasets

Overview of experimental design and conditions in rice fields

The experiment area is located in Haicheng City, Liaoning Province (122°43′33″E, 40°58′44″N), which is in the central part of the Liaodong Peninsula, located in the temperate semi-moist continental monsoon climate zone, with an average annual precipitation of 710.2 mm, abundant rainfall and four distinct seasons. The average precipitation at the site from June to August is 57.4 mm, the average temperature is 25 °C, and the average number of days of rainfall is 11. The experimental field spanned an area of 0.39 hectares. The local cultivar “Yanfeng 47” was selected for the experiment, with Mongolian rice planted as the induced variety in this study. Planting occurred on May 25, 2022, following row spacings of 30 cm and 17 cm, respectively. An overview of the experimental area is depicted in Fig. 1.

Fig. 1
figure 1

Overview of the experimental area

Artificial disease was conducted on July 13 and July 15. To enhance the infection rate in rice, the experiment was carried out at 5:00 p.m., during which a spore suspension with a concentration of 9 mg/mL was uniformly sprayed onto the leaves and lower part of the rice canopy using spray cans. Apart from disease control measures, other field management was normal. Access to experimental data commenced under the guidance of plant protection experts once visible spots appeared on the rice.

Image acquisition

Under the guidance of plant protection experts, the HUAWEI HONOR9x cell phone was used to capture images of the disease under different weather conditions (sunny and cloudy) and at various fertility stages (jointing, heading, and booting stages). A total of 2011 images were collected, each with a resolution of 4000 × 3000 pixels, and stored in JPEG format. Of these, 621 images were taken during the jointing stage, and the remaining 1390 images were captured during the booting and heading stages. As detailed in Table 1, the dataset was divided into training, validation, and test sets in an 8:1:1 ratio to facilitate the model’s training and validation. Furthermore, the data were analyzed based on MS COCO’s criteria for defining targets of different scales [24]. The dataset contains 16,383 targets, with small and medium-scale targets numbering 10,351, representing 63.2% of the total targets, as illustrated in Table 2.

Table 1 Data distribution
Table 2 Distribution of targets at different scales in the dataset

Data augmentation

In computer vision, the quality of a dataset is critical to the model’s performance and generalization ability [25], while manually acquired data cannot encompass all scenarios in natural environments. Therefore, we processed the training set by adjusting the contrast, brightness, and saturation and flipping or scaling the images in this study. Data augmentation was performed using the OpenCV library, and brightness and contrast were adjusted by changing the image’s pixel values. Subsequently, the image was converted from the BGR color space to the HSV color space, and the saturation channel was adjusted before converting the image back to the BGR color space. The image was then flipped in the vertical or horizontal directions and scaled with a scaling factor ranging from 0.5 to 1.5. Partial presentation of data after data augmentation, as shown in Fig. 2. After data augmentation, the number of images in the training set extends to 8048, of which the small, medium, and large targets are 9684, 34,457, and 20,623, respectively. The small and medium-scale targets account for 68.2% of all the targets in the training set, as shown in Fig. 3.

Fig. 2
figure 2

Partial presentation of data after data augmentation

Fig. 3
figure 3

Distribution of targets at different scales in the training set before and after data augmentation

The proposed Pyramidd-YOLOv8 rice leaf blast detection model

As shown in Fig. 4, to achieve accurate and rapid identification and localization of leaf blast, we proposed the Pyramid-YOLOv8 detection algorithm. The algorithm primarily improves the neck and head of the model and introduces a lightweight C2F-Pyramid module to boost the efficiency of feature extraction. Specifically, the feature maps generated by the backbone feature extraction layer at different scales will first pass through the CBAM module to extract spatial and channel information, thereby enhancing the feature representation. The neck introduces higher-resolution feature maps for multi-scale feature fusion, thus compensating for the information loss during feature transmission. At the same time, an additional detection head is added to the Pyramid-YOLOv8 head, utilizing high-resolution feature maps for predictions, effectively enhancing the accuracy of small target detection. Finally, to further improve the detection efficiency of the model, we have introduced a lightweight C2F-Pyramid module, which maintains optimal performance while minimizing the model’s redundant computations.

Fig. 4
figure 4

Pyramid-YOLOv8 model

Multi-attention feature fusion structure based on an additional detection head

The feature pyramid structure in target detection networks represents an efficient feature fusion strategy [26]. It enhances the detection of targets at different scales by fusing features with different semantics and resolutions. Deep features generally carry more semantic information, whereas shallow features have higher resolution and more spatial location information. Therefore, fusing deep and shallow features is beneficial for enhanced target identification and localization.

To further enhance small targets’ detection accuracy, we have introduced a high-resolution feature map into the feature fusion structure and constructed an additional detection head. The improved feature fusion structure is illustrated in Fig. 5. This new feature map, measuring \(\:160\times\:160\), is generated from the backbone feature extraction layer alongside three other feature maps of sizes \(\:80\times\:80\), \(\:40\times\:40\), and \(\:20\times\:20\). Deep, high-semantic information is initially transmitted via upsampling and concatenation across features at varying scales. Subsequently, a bottom-up structure supplements the spatial location information, thus efficiently utilizing the feature information from different layers to improve detection accuracy.

Fig. 5
figure 5

The feature fusion structure of Pyramid-YOLOv8

Fig. 6
figure 6

The CBAM structure

Next, we incorporate the CBAM to suppress background noise in the shallow features, the structure of which is depicted in Fig. 6. The CBAM, an attention mechanism, consists of a channel attention module (CAM) and a spatial attention module (SAM), significantly enhancing feature representation by selectively weighting channel and spatial information. As illustrated in Fig. 6 (a), for a given input feature map \(\:F\in\:{\mathbb{R}}^{h\times\:w\times\:c}\), the CAM initially performs average pooling and maximum pooling, producing two \(\:1\times\:1\times\:c\) sized feature maps, \(\:{F}_{Avg}^{1\times\:1\times\:c}\) and \(\:{F}_{Max}^{1\times\:1\times\:c}\). These maps are fed into a shared multilayer perceptron (MLP) with hidden layers. After refining each channel feature, two feature vectors of dimension \(\:1\times\:1\times\:c\) are obtained. They are then fused via element-by-element summation, resulting in the final channel attention map \(\:{M}_{c}\in\:{\mathbb{R}}^{1\times\:1\times\:c}\):

$$\eqalign{{M_c}\left( F \right) & = \gamma \left( {MLP\left( {AvgPool\left( F \right)} \right) + mlp\left( {MaxPool\left( F \right)} \right)} \right) \cr & = \gamma \left( {{w_1}\left( {{w_0}\left( {F_{Avg}^c} \right)} \right) + {w_1}\left( {{w_0}\left( {F_{Max}^c} \right)} \right)} \right) \cr} $$
(1)

where \(\:\gamma\:\) represents the sigmoid function. The \(\:{M}_{c}\), generated by the CAM module, contains weight coefficients for different channels. It emphasizes the importance of various channels by weighting the initial feature map.

The SAM focuses on capturing spatial relationships within feature maps to generate spatial attention maps \(\:{M}_{s}\left(F\right)\in\:{\mathbb{R}}^{h\times\:w}\). Differing from the channel attention, the SAM concentrates on the positional information of features, as structured in Fig. 6 (b). Initially, two two-dimensional maps are obtained through average pooling and maximum pooling: \(\:{F}_{Avg}^{s}\in\:{\mathbb{R}}^{h\times\:w}\) and \(\:{F}_{Max}^{s}\in\:{\mathbb{R}}^{h\times\:w}\). These two 2D maps are then processed using a \(\:7\times\:7\) convolution kernel to fuse spatial information effectively. Subsequently, to realize the fusion of spatial information, the above two 2D maps are processed using a convolution kernel of size \(\:7\times\:7\). The spatial attention is computed as:

$$\eqalign{{M_s}\left( F \right) & = \sigma \>\left( {{f^{7 \times \>7}}\left[ {AvgPool\left( F \right);\>MaxPool\left( F \right)} \right]} \right) \cr & = \sigma \>\left( {{f^{7 \times \>7}}\left[ {F_{Avg}^s;F_{Max}^s} \right]} \right) \cr} $$
(2)

where \(\:{f}^{7\times\:7}\) represents a convolution operation with a convolution kernel size of 7. Similar to \(\:{\text{M}}_{\text{c}}\), \(\:{\text{M}}_{\text{s}}\) mainly emphasizes the importance of unique regions within the feature map.

As depicted in Fig. 4, the CBAM module is positioned before the feature fusion process. After fusing these processed feature maps of varying scales, the neck generates feature maps in sizes of \(\:160\times\:160\), \(\:80\times\:80\), \(\:40\times\:40\), and \(\:20\times\:20\) for the detection of targets across different scales.

C2F-pyramid module

In the Pyramid-YOLOv8 network structure, we have replaced the C2F (Cross Stage Partial Network Bottleneck with 2 Convolutions) in YOLOv8 [27] with C2F-Pyramid. Within C2F-Pyramid, the initial Bottleneck module has been substituted with our innovative Pyramid-Bottleneck module, the structure of which is illustrated in Fig. 4. The initial Bottleneck module structure is illustrated in Fig. 7. The input feature map undergoes two stages: a compression stage and an expansion stage. In the compression stage, the network condenses the input information into smaller spatial dimensions, thereby retaining the most crucial information. The expansion stage is tasked with reconstructing this compressed information and concludes with the attachment of residual connection to prevent gradient vanishing in deeper networks. The structure of the Pyramid-Bottleneck module is shown in Fig. 8, which is designed to solve the problem of excessive redundant computation caused by the Bottleneck being repeated many times as the neural network continues to deepen.

Fig. 7
figure 7

Bottleneck structure in the C2F module

Fig. 8
figure 8

The Pyramid-Bottleneck structure

Fig. 9
figure 9

Receptive fields of combinations of different types of convolutions on the original feature map. One of the 5 × 5 convolutions is replaced by two regular 3 × 3 convolutions

A high degree of similarity exists among different channels within the feature map. The first or last consecutive \(\:\text{c}\text{p}\) channels can be generally computed as representatives of the whole feature map [28]. Therefore, in the Pyramid-Bottleneck module, we select two consecutive \(\:\text{c}\text{p}\) channels as representatives of the entire feature map for feature mining. As illustrated in Fig. 8, when the feature map \(\:F\in\:{\mathbb{R}}^{h\times\:w\times\:c}\) is input into the Pyramid-Bottleneck module, it undergoes division by channel dimension. At the end of this division process, three feature maps will be obtained: \(\:{f}_{1}{\mathbb{R}}^{h\times\:w\times\:cp}\), \(\:{f}_{2}\in\:{\mathbb{R}}^{h\times\:w\times\:cp}\), and \(\:{f}_{3}\in\:{\mathbb{R}}^{h\times\:w(1-2\times\:cp.)}\). \(\:{f}_{1}\) will pass through two convolutional layers, each with a kernel size of \(\:3\times\:3\), setting the stride and padding to \(\:1\) to ensure that the output feature map size remains constant. Here, \(\:cp=r\times\:c\), where r represents the ratio of channels in the generated feature map to the channels in the input feature map. Similarly, \(\:{f}_{2}\) passes through a single convolutional layer of the same size, \(\:3\times\:3\), and follows the same configuration. The advantage of doing this is that it allows for obtaining the same receptive field on \(\:{f}_{1}\) as in the initial Bottleneck module while simultaneously incorporating feature information from\(\:{f}_{2}\). This method effectively avoids the potential loss of feature information. The two sets of feature maps \(\:{f}_{1}^{{\prime\:}}\in\:{\mathbb{R}}^{h\times\:w\times\:cp}\) and \(\:{f}_{2}^{{\prime\:}}\in\:{\mathbb{R}}^{h\times\:w\times\:cp}\) generated above are concatenated with \(\:{f}_{3}\), and two Pointwise Convolutions (PWConvs) are appended at the end to maintain the flow of information between the channels. By integrating regular convolution with PWConv, the output of the Pyramid-Bottleneck module creates a pyramid-like distribution of receptive fields on the original feature map, as illustrated in Fig. 9. Within this structure, Batch Normalization (BatchNorm) is applied following the first PWConv layer, which is coupled with the SiLU (Sigmoid Linear Unit) activation function. The SiLU function is shown as formula (3):

$$SiLU\left( x \right) = xln\left( {1 + x} \right)$$
(3)

Finally, a residual structure is incorporated at the end to prevent the issue of vanishing gradients. The feature map \(\:{F}^{{\prime\:}}\), obtained after processing through the Pyramid-Bottleneck structure, is calculated as per the formulas (4–7):

$${f_1},{f_2},{f_3} = spil{t_{(cp.,\>cp.,\left( {c - 2 \times \>cp} \right)}}\left( F \right)$$
(4)
$$f_1^\prime = \sigma {\>_{{\rm{3,1}},1}}\left( {\sigma {\>_{{\rm{3,1}},1}}\left( {{f_1}} \right)} \right)$$
(5)
$$f_2^\prime = \sigma {\>_{{\rm{3,1}},1}}\left( {{f_2}} \right)$$
(6)
$${F^\prime } = F + {\tau _1}\,\left( {{\tau _2}\left( {f_1^\prime \oplus f_2^\prime \oplus {f_3}} \right)} \right)$$
(7)

where \(\:{spilt}_{(cp.,\:cp.,\left(c-2\times\:cp\right)}\) represents dividing the feature map by the number of channels as \(\:cp\), \(\:cp\), and \(\:c-2\times\:cp\) into three parts, \(\:{\sigma\:}_{\text{3,1},1}\) represents a convolution operation with a kernel size of \(\:3\times\:3\) and a stride and padding of 1, \(\:{\tau\:}_{1}\) represents the PWConv, \(\:{\tau\:}_{2}\) represents the PWConv after the appending of BatchNorm and SiLU, \(\:\) represents the concatenation of the feature map.

In terms of computational complexity, for a feature map with an input size of \(\:h\:\times\:\:w\:\times\:\:c\), the FLOPs after processing through the original Bottleneck module are calculated as follows:

$$\eqalign{ FLOP{s_{base}}{\mkern 1mu} & = h \times w \times {3^2} \times c \times 0.5c + h \cr & \times w \times {3^2} \times 0.5c \times c + h \times w \times c \cr} $$
(8)

The FLOPs of the Pyramid-Bottleneck module are:

$$\eqalign{ FLOP{s_{pyramid}}\, & = 3 \times h \times w \times {3^2} \times c_p^2 + 2 \cr & \times h \times w \times {1^2} \times {c^2} + h \times w \times c \cr} $$
(9)

When \(\:r=\frac{{c}_{p}}{c}\) is set to \(\:1/3\), where \(\:r\) represents the ratio of used channels in the calculation, the FLOPs of the Pyramid-Bottleneck module amount \(\:48\times\:h\times\:w\times\:{c}_{p}^{2}\), and this is less than the original Bottleneck’s calculation of \(\:84\times\:h\times\:w\times\:{c}_{p}^{2}\). It is important to note that the calculations presented here do not consider the FLOPs from BatchNorm and SiLU. This is because the original Bottleneck structure incorporates BatchNorm with SiLU after each convolutional layer, whereas in the Pyramid-Bottleneck, it is only added after the first PWConv. Consequently, this difference does not adversely affect the results of the analysis.

By incorporating the C2F-Pyramid module, the number of parameters and the computational effort required by the Pyramid-YOLOv8 can be significantly reduced. Its internal Pyramid-Bottleneck module more efficiently extracts features of small target diseases and ensures the coherence of information across feature map channels.

Experimental environment and parameters

In this study, the experiment is based on the PyTorch 1.13.1 framework and Python 3.8.18, and the model’s training and testing are carried out under a 64-bit Windows system. The server is equipped with an NVIDIA GeForce RTX 4080 16GB GPU with a memory capacity (RAM) of 32GB. The GPU is accelerated by utilizing CUDA11.6, and the training parameters are shown in Table 3. To ensure a rigorous and fair evaluation of both the proposed and comparative methods, we uniformed the experimental environment and parameters across all tests.

Table 3 Training parameters

Evaluation metrics

In order to comprehensively evaluate the performance of our model, we have chosen a series of standard metrics, including Precision, Recall, Average Precision (AP), and F1 score. These metrics are widely used in target detection tasks, enabling model performance evaluation in multiple dimensions. The AP and F1 score are comprehensive metrics: the AP calculates the average of precision at different recall levels; the F1 score is a reconciled average of precision and recall, effectively balancing the two metrics. These metrics indicate how good the model is at target detection, with higher scores meaning better detection by the model. They are calculated as follows:

$$P = {{TP} \over {TP + FP}} \times 100\% $$
(10)
$$R = {{TP} \over {TP + FN}} \times 100\% $$
(11)
$${AP = \int \> _0^1P\left( R \right)dR \times \>100\% }$$
(12)
$${F1 = {{2 \times \>Precision \times \>Recall} \over {Precision + Recall}} \times \>100\% }$$
(13)

True positive (TP) represents the case where a positive sample is correctly categorized as a positive sample; false positive (FP) represents the case where a negative sample is misclassified as a positive sample; true negative (TN) represents the case where a negative sample is correctly categorized as a negative sample; false negative (FN) represents the case where a positive sample is incorrectly classified as a negative sample.

Results and discussion

Comparison of different models

In order to validate the superiority of the Pyramid-YOLOv8 algorithm in detecting rice diseases, five other detection algorithms were selected for comparison in this study: Faster-RCNN [29], RT-DETR [30], YOLOv3-SPP [31], YOLOv5 [32], YOLOv7 [33], YOLOv9 [34], and YOLOv10 [35]. Table 4 presents the experimental results of these different algorithms. Through in-depth analysis, it is evident that the Pyramid-YOLOv8 model exhibits comprehensive performance advantages over other models. It has the highest precision (81.0%) and mAP@0.5 of 84.3%, as well as the highest F1 score (78.3%), which indicates that Pyramid-YOLOv8 achieves the optimal balance between precision and recall. Moreover, despite being slightly slower in processing speed (FPS) compared to YOLOv7x, Pyramid-YOLOv8 has fewer parameters (42.0 M) and a model size of only 75.9 MB, offering advantages in terms of resource efficiency and deployment.

Table 4 Comparison of experimental results for different detection models

During further analysis of the model, we found that the one-stage detection models have completely surpassed the two-stage detector (Faster-RCNN) in terms of detection speed and accuracy. Specifically, YOLOv3-SPP, YOLOv5x, YOLOv7x, YOLOv9e, and YOLOv10x exhibit 2.5%, 3.8%, 8.4% 6.2%, and 1.7% higher mAP@0.5 than Faster-RCNN, respectively. They also demonstrate considerably higher detection speeds, with YOLOv7x achieving the fastest rate at 84.7 FPS. However, there are some limitations in the number of computations, where YOLOv3-SPP is 110.5% higher in FLOPs than Faster-RCNN.

YOLOv5x YOLOv7x, YOLOv9e, and YOLOv10x fuse the feature maps generated from \(\:8\times\:\), \(\:16\times\:\), and \(\:32\times\:\) downsampling, which helps the model detect targets of different sizes while maintaining the ability to detect small targets. However, the detection head of YOLOv5x adopts a coupled approach, where the extracted features are simultaneously responsible for both the classification and regression tasks, which puts YOLOv5x at risk of overfitting during training on the rice disease dataset. Although YOLOv7x achieved the fastest detection speed among all models, it consumed a substantial amount of resources during the training process, which is not conducive to the model’s application in practical scenarios. The programmable gradient information (PGI) training method introduced in YOLOv9e addresses the issue of information loss in deep neural networks. However, adjusting the gradient flow during the backpropagation process may lead to the model over-optimizing for specific data types, which results in poorer performance when generalizing to data on rice leaf blast. YOLOv10x improves the model’s prediction efficiency by eliminating non-maximum suppression (NMS). Although it achieves the best performance regarding model parameters and FLOPs, the highly compressed model structure reduces its accuracy. Moreover, the RT-DETR model, which applies the Transformer to object detection tasks, ranks in accuracy between YOLOv5x and YOLOv7x. However, due to the adoption of the Transformer architecture in RT-DETR, its prediction process is relatively more complex and computation-intensive. Therefore, it has the slowest detection speed compared to other single-stage detectors.

In comparison, Pyramid-YOLOv8 introduces the lightweight C2F-Pyramid module in the feature extraction, effectively reducing the model’s parameters and computational load. Additionally, the additional detection head utilizes higher-resolution feature maps for small target detection, and the introduction of high-resolution feature maps in feature fusion effectively solves the problem of information loss of features in the transmission process. At the same time, under the synergistic effect of the CBAM module, the model can capture more detailed information about the target and effectively suppress background noise in the image, thereby enhancing the detection capability for small objects.

Fig. 10
figure 10

Detection results of different algorithms

Figure 10 displays the detection results of leaf blast disease using different algorithms. We mark the targets for misdetections and misses with blue arrows versus yellow circles. Among them, Pyramid-YOLOv8 correctly identified all disease targets in the images, while the other object detection algorithms each made more than one error. In Fig. 10 (a and b), Faster-RCNN and RT-DETR incorrectly detected mistakenly identified rice spikes as disease spots. Faster-RCNN, YOLOv3-SPP, YOLOv5x, YOLOv7x, YOLOv9e and YOLOv10x missed subtle disease spots to varying degrees in Fig. 10 (a, c, d, e, f, and g). In Fig. 10 (h), it is evident that our method can accurately identify and locate small disease spots, effectively distinguishing areas where they overlap. These results indicate that fusing high-resolution feature maps can effectively enhance the model’s contextual understanding, and the additional detection head constructed performs well in detecting small objects.

Fig. 11
figure 11

Detection results of different algorithms for dense small targets

Finally, considering accuracy and efficiency, Pyramid-YOLOv8 outperforms the other seven object detection algorithms. It is suitable for scenarios that require high precision and reasonable computational resources, and its detection results are more consistent with real-world conditions. It is worth noting that the detection speed of Pyramid-YOLOv8 is slightly inferior to YOLOv7x, and the FLOPs and parameters are slightly inferior to YOLOv10x. We also found a common problem in selecting the models with higher accuracy for comparison: all models have poor detection accuracy against dense, small target regions. Figure 11 shows that YOLOv7x has serious leakage when detecting dense small targets. Although YOLOv9x, RT-DETR, and Pyramid-YOLOv8 were able to detect most areas of disease, they struggled to differentiate individual targets within those regions accurately. It indicates that further improvements and optimizations are needed to enhance both real-time performance and the model’s ability to detect dense, small targets accurately.

Effect of different values of r on the C2F-Pyramid module

In the C2F-Pyramid module, the value of \(\:r\) (the proportion of channels used for computation within its internal Pyramid-Bottleneck structure relative to the total number of feature map channels) determines the representation of all feature map channels during feature extraction. It is important to note that in the Pyramid-Bottleneck structure (as shown in Fig. 8), feature extraction is undertaken by \(\:Branch\_1\) and Branch_2. When r is set to \(\:1/8\), \(\:1/4\) of the channels effectively act as representatives of the entire feature map. To optimize the performance of the C2F-Pyramid module, we integrated it into the YOLOv8x framework as a baseline, conducting experiments with r as a variable, as shown in Table 5.

Table 5 Test results of C2F-Pyramid module in different models

Table 6 illustrates that when \(\:r\) is set to \(\:1/8\), YOLOv8x_A achieves the lowest parameter and computation volume, with a reduction of 41.3% and 43.6%, respectively, compared to YOLOv8x. When \(\:r\) is increased to \(\:1/4\), YOLOv8x_B attains the highest Precision, marking a 1.9% increase over YOLOv8x. Notably, at \(\:r=1/3\), YOLOv8x_C demonstrates superior detection accuracy, achieving the highest scores in Recall, mAP@0.5, and F1 score, with improvements of 1%, 1.5%, and 1.3%, respectively, compared to YOLOv8x. In terms of detection speed, YOLOv8x_C maintains the same FLOPs as YOLOv8x_B but has a higher FPS of 92.6, which is an increase of 22.6. Furthermore, a deeper analysis of the experimental results reveals that with the increment of \(\:r\), the model experiences phased improvements in detection accuracy. This structural modification is very friendly to the model’s memory footprint; it does not add a memory footprint. However, we refrained from testing higher \(\:r\) values because, at \(\:r=1/3\), YOLOv8x_C’s Precision began to decline. Additionally, setting r to \(\:1/2\) would mean omitting Branch_3 in Fig. 7, contradicting our design philosophy discussed in Sect. 2.2.2.

Table 6 The effect of the value of r on the C2F-Pyramid module

In summary, when \(\:r\) is set to \(\:1/3\), the C2F-Pyramid module exhibits the best overall performance. Therefore, in this study, we have set the \(\:r\) value in the C2F-Pyramid module to \(\:1/3\).

Performance of the C2F-Pyramid module in other models

To verify the effectiveness and efficiency of the C2F-Pyramid module, we integrated it into the YOLOv3-SPP, YOLOv5, YOLOv7 and YOLOv10 models, resulting in the creation of three new models: Pyramid-YOLOv3-SPP, Pyramid-YOLOv5x, Pyramid-YOLOv7x, and Pyramid-YOLOv10x. These models were trained and validated using the leaf blast dataset, with the results presented in Table 5.

In terms of accuracy, the improved Pyramid-YOLOv3-SPP and Pyramid-YOLOv5x models achieved an increase of 2.3% and 1% in mAP@0.5 and an improvement of 1.6% and 1.2% in F1 score, respectively. Regarding the model parameters and computation, the parameters and FLOPs of Pyramid-YOLOv3-SPP decreased by 55.2% and 60.0%, respectively, while those of Pyramid-YOLOv5x decreased by 20.1% and 13.6%. As for detection speed, introducing the lightweight C2F-Pyramid module resulted in FPS increases of 14.9 and 7.3 for Pyramid-YOLOv3-SPP and Pyramid-YOLOv5x, respectively. As for the modified Pyramid-YOLOv7x, while the lightweight C2F-Pyramid module led to a slight loss in accuracy, the parameters and FLOPs of Pyramid-YOLOv7x decreased by 54.5% and 52.5%, respectively, and it achieved the highest FPS of 114.9 among all the tested models. Meanwhile, Pyramid-YOLOv10x obtains a further decrease in the number of parameters and computation of the model while maintaining the original detection accuracy. These experimental results demonstrate that the C2F-Pyramid module exhibits exceptional performance in the Pyramid-YOLOv8 model and highlights its high adaptability and efficiency in other mainstream algorithms.

Ablatio experiments of different modules in Pyramid-YOLOv8

Pyramid-YOLOv8 represents an improvement of the YOLOv8x. In Sect. 3.2 and 3.3, we established the structure of the C2F-Pyramid module and its effectiveness in other models. To further validate the effectiveness of different improvements on model performance, we conducted ablation studies, the results of which are shown in Table 7. Compared to the original model, Pyramid-YOLOv8 exhibited a 6.1% increase in mAP@0.5 and a 3.7% increase in F1 score. Additionally, there was a significant reduction in parameters and FLOPs, by 38.3% and 23.8% respectively. The improved model demonstrated enhanced performance in detecting leaf blast disease.

Table 7 Results of ablation experiments

Detection head based on higher-resolution feature maps

As shown in Table 8, by incorporating a detection head based on higher-resolution feature maps, YOLOv8x_Head surpasses the original model in various metrics: precision (79.2% compared to 77.6%), recall (72.9% compared to 71.8%), mAP@0.5 (81.2% compared to 78.2%), and F1 score (75.9% compared to 74.6%). Figure 12 presents the detection results of YOLOv8x_head compared to the original model (with missed targets indicated by black boxes in the images). It is evident, especially in Fig. 11 (c and d), that YOLOv8x_Head shows higher adaptability in detecting smaller disease targets. It demonstrates that detection heads built on higher-resolution feature maps can significantly improve the accuracy of detecting small targets. It is worth noting that the number of parameters of the model is not boosted after introducing the additional detection head, mainly because the same convolutional kernels are used inside the detection heads for different scale features, thus allowing a part of the parameters to be shared. However, this performance enhancement comes at the cost of increased computation volume, which is not conducive to real-time detection.

Table 8 Model performance with an additional detection head
Fig. 12
figure 12

Detection results of YOLOv8x-head. (a) and (b) represent the detection results of YOLOv8x. (c) and (d) represent the detection results of YOLOv8 -head

CBAM

Due to its complex pathological morphology and relatively small size, leaf blast presents challenges in feature extraction for models, as practical features are difficult to capture. Moreover, background noise in shallow features can adversely affect detection accuracy. To address this issue, we incorporated the CBAM attention mechanism into the model. In these studies [36, 37], CBAM was added at different locations, enabling the model to focus on the target adaptively, thereby capturing critical information and subtle features. In Pyramid-YOLOv8, the CBAM module is reasonably placed before feature fusion because of the additional detection head integrates shallower features, which often contain significant background noise, impacting the model’s accuracy in detecting small targets. Table 9 presents the experimental results of the model after integrating the CBAM mechanism.

Table 9 Performance of the model after the introduction of CBAM

As shown in Table 9, following the integration of the CBAM module, YOLOv8x_CBAM demonstrated an increase of 0.3% in mAP@0.5 and 0.8% in the F1 score. We employed Grad-CAM technology for model visualization to facilitate a more effective comparison. This technique highlights key regions in feature maps by emphasizing important spatial locations [38]. Figure 13 displays the visualization results. From Fig. 13 (c and f), it is clear that YOLOv8x_CBAM outperforms the original model in focusing on diseased areas of the image and more effectively suppresses feature noises.

Fig. 13
figure 13

Grad-CAM visualization results. (a) and (d) represent the original image. (b) and (e) represent the area of interest of YOLOv8x. (c) and (f) represent the area of interest of YOLOv8-CBAM

C2F-Pyramid module

Table 7 shows that the number of parameters and computations significantly increased after adding a detection head and introducing the CBAM module. In order to balance the relationship between model performance, we introduce the improved C2F-Pyramid module into the model. As Shown in Table 10, YOLOv8x_C2F-Pyramid has a 41.3% and 40.0% decrease in parameters and FLOPs, respectively, while mAP@0.5 and F1 score are increased by 1.5% and 1.3%, respectively, and the FPS is increased by 13.5, compared to the base model. It demonstrates that the new C2F-Pyramid module can significantly improve model detection performance while reducing computational overhead.

Table 10 Performance of the model after the introduction of C2F-Pyramid

With the introduction of the new detection head, the CBAM module, and the C2F-Pyramid module, the Pyramid-YOLOv8 detection efficiency has been improved. Among them, an additional detection head for prediction based on high-resolution feature maps improves the small target detection accuracy. The CBAM module utilizes the mechanism of spatial and channel dual attention to effectively guide the model’s focus to concentrate on the key information in the image, improving the feature fusion efficiency. The C2F-Pyramid module reduces the redundant computations in the feature extraction process and significantly reduces the number of parameters and computations of the model. It is worth noting that despite the lower FLOPs of Pyramid-YOLOv8, the model exhibits a lower FPS than YOLOv8x-CBAM, which has higher FLOPs—this counterintuitive performance mainly because of the integration of the C2F-Pyramid module within the Pyramid-YOLOv8 architecture. The C2F-Pyramid module is designed to enhance feature extraction through channel hierarchies. Although beneficial for feature richness, these operations lead to increased latency due to the overhead associated with managing multiple data paths and the intensive memory access required. While improving the model’s accuracy and detection capabilities, this architectural choice adversely affects the inference speed, thereby explaining the observed discrepancy in FPS performance. These improved methods significantly improved the detection performance of the Pyramid-YOLOv8 model, which can identify rice diseases more accurately and reliably.

Application of Pyramid-YOLOv8 in the detection of leaf blast

To achieve the practical application of Pyramid-YOLOv8, we developed an application called “Pyramid-YOLOv8 for rice leaf blast detection,” which was developed using the PyQt6 framework and Python language. PyQt6 allows seamless operation across Windows, macOS, and Linux, enhancing user accessibility and adaptability, and Python language due to its simplicity and the rich ecosystem of libraries that facilitate rapid development and troubleshooting. Users can transfer captured images locally and utilize the locally deployed Pyramid-YOLOv8 model for detection. As shown in Fig. 14, the detection results are displayed in the software, along with the specific distribution of the disease in the image. Researchers can use this information to assess the severity of rice leaf blast infection, which is beneficial for early warning of rice diseases. Notably, this is only a preliminary exploration. A more advanced approach would involve building a cloud platform where users could upload data to a cloud server through an image acquisition module, allowing the cloud server to perform the detection. However, this method incurs higher costs. An alternative is to deploy the model on low-cost embedded devices with limited resources. These devices can process and analyze data locally, reducing transmission and processing delays and enabling real-time detection. Nevertheless, the computational power and memory limitations of embedded devices are not ideal for deploying complex models. Although the application is currently designed for desktop environments, future work may explore adaptation for mobile platforms, mainly Android, to expand its real-world applicability.

Fig. 14
figure 14

The software interface: The figure corresponds to the results of the disease detection and the specific distribution of the disease in the image, respectively

Future research directions

Despite the achievements of this study in rice leaf blast detection, there is still significant room for improvement before moving to the practical application stage. Future work will focus on the following areas:

(1) Expand the coverage of rice disease detection to include more types of diseases. This will require training the Pyramid-YOLOv8 model with a dataset containing a wide range of rice disease types to improve its accuracy in recognizing and classifying diseases in natural agricultural environments.

(2) Optimize the framework of the Pyramid-YOLOv8 model. The Pyramid-YOLOv8 model still requires significant computing resources, which is not conducive to its deployment on resource-constrained edge devices. Therefore, we will use a more concise and efficient method to optimize Pyramid-YOLOv8, including replacing the model backbone with a lighter one, such as MobileNets [38, 39, 40].

(3) Improve the real-time performance of the model and address the difficulty of detecting dense small targets. Since the improved C2F-Pyramid module adds additional memory allocation operations to the feature map segmentation, the FPS of the model does not increase as the computational load decreases. Meanwhile, the model demonstrates some shortcomings when targeting dense small target detection. In the future, we will adopt the NMS-Free optimization strategy to realize Pyramid-YOLOv8 as an end-to-end detection model, thereby improving the model’s real-time performance and adopt a more refined feature extraction strategy and target localization to improve the accuracy of the model in detecting dense and small targets. We are committed to improving the effectiveness of disease management through research and development and promoting more sustainable and efficient agricultural production.

Conclusion

In this study, we proposed a disease detection algorithm based on an improved Pyramid-YOLOv8, which achieves accurate detection of rice leaf blast by introducing an additional detection head, CBAM, and C2F-Pyramid. The experimental results showed that the algorithm outperforms current mainstream disease detection algorithms, achieving 84.3% mAP@0.5 on a self-constructed rice leaf blast dataset. Moreover, the model size was reduced from 130.3 MB to 75.9 MB, attaining a frame rate of 62.5 frames per second (FPS) on the RTX 4080. Finally, we developed an application for leaf blast detection, which will be an essential tool for field workers for early disease prevention.

Data availability

No datasets were generated or analysed during the current study.

References

  1. Savary S, et al. The global burden of pathogens and pests on major food crops. Nat Ecol Evol. 2019;3(3):430–9.

    Article  PubMed  Google Scholar 

  2. Feng S, et al. A deep convolutional neural network-based wavelength selection method for spectral characteristics of rice blast disease. Comput Electron Agric. 2022;199:107199.

    Article  Google Scholar 

  3. Lu Q, et al. Analysis of the occurrence of major diseases in five major rice producing areas in China in recent years (in Chinese). China Plant Prot Guide. 2021;41(4):37–42.

    Google Scholar 

  4. Lei F, et al. Research on grading method for rice leaf blight detection based on multispectral imaging (in Chinese). Spectrosc Spectr Anal. 2009;10:2730–3.

    Google Scholar 

  5. Zhao D, et al. Study on the classification method of Rice Leaf Blast levels based on Fusion features and adaptive-weight Immune Particle Swarm optimization Extreme Learning Machine Algorithm. Front Plant Sci. 2022;13:879668.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Kalia S, Rathour R. Current status on mapping of genes for resistance to leaf- and neck-blast disease in rice. 3 Biotech. 2019;9(6):209.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Huang S, et al. A deep convolutional neural network-based method for rice spike blight detection (in Chinese). J Agricultural Eng. 2017;20:169–76.

    Google Scholar 

  8. Deng X, et al. Detection of citrus huanglongbing based on multi-input neural network model of UAV hyperspectral remote sensing. Remote Sens. 2020;12(17):2678.

    Article  Google Scholar 

  9. Prajapati HB, Shah JP, Dabhi VK. Detection and classification of rice plant diseases. Intell Decis Technol. 2017;11(3):357–73.

    Google Scholar 

  10. Chung C-L, et al. Detecting Bakanae disease in rice seedlings by machine vision. Comput Electron Agric. 2016;121:404–11.

    Article  Google Scholar 

  11. Ghyar BS, Birajdar GK. Computer vision based approach to detect rice leaf diseases using texture and color descriptors. in. 2017 International conference on inventive computing and informatics (ICICI). 2017. IEEE.

  12. Dargan S, et al. A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng. 2020;27:1071–92.

    Article  Google Scholar 

  13. Too EC, et al. A comparative study of fine-tuning deep learning models for plant disease identification. Comput Electron Agric. 2019;161:272–9.

    Article  Google Scholar 

  14. Jia Shaopeng G, Hongju, and Hang Xiao. Research progress on image recognition technology of crop pestsdiseases based on deep learning (in Chinese). 50.B07 (2019): 313–7.

  15. Lu Y, et al. Identification of rice diseases using deep convolutional neural networks. Neurocomputing. 2017;267:378–84.

    Article  Google Scholar 

  16. Zhou G, et al. Rapid detection of rice disease based on FCM-KM and faster R-CNN fusion. IEEE Access. 2019;7:143190–206.

    Article  Google Scholar 

  17. Tian L, et al. VMF-SSD: a novel v-space based multi-scale feature fusion SSD for apple leaf disease detection. IEEE/ACM Transactions on Computational Biology and Bioinformatics; 2022.

  18. Li Y, et al. One-stage disease detection method for maize leaf based on multi-scale feature fusion. Appl Sci. 2022;12(16):7960.

    Article  CAS  Google Scholar 

  19. Kiratiratanapruk K et al. Using deep learning techniques to detect rice diseases from images of rice fields. in International conference on industringineering and other applications of applied intelligent systems. 2020. Springer.

  20. Li Y, et al. Detection of powdery mildew on strawberry leaves based on DAC-YOLOv4 model. Comput Electron Agric. 2022;202:107418.

    Article  Google Scholar 

  21. Khan AI, et al. Deep diagnosis: a real-time apple leaf disease detection system based on deep learning. Comput Electron Agric. 2022;198:107093.

    Article  Google Scholar 

  22. Li J, et al. An improved YOLOv5-based vegetable disease detection method. Comput Electron Agric. 2022;202:107345.

    Article  Google Scholar 

  23. Zhang D-Y, et al. Detection of wheat scab fungus spores utilizing the Yolov5-ECA-ASFF network structure. Comput Electron Agric. 2023;210:107953.

    Article  Google Scholar 

  24. Lin T-Y et al. Microsoft coco: Common objects in context. in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13. 2014. Springer.

  25. Yang S, et al. Strawberry ripeness detection based on YOLOv8 algorithm fused with LW-Swin Transformer. Comput Electron Agric. 2023;215:108360.

    Article  Google Scholar 

  26. Lin T-Y et al. Feature pyramid networks for object detection. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

  27. Jocher G, Chaurasia A, Qiu J. Ultralytics YOLOv8. 2023.

  28. Chen J et al. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

  29. Ren S et al. Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst, 2015. 28.

  30. Lv W et al. Detrs beat yolos on real-time object detection. arXiv preprint arXiv:2304.08069, 2023.

  31. Redmon J. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.

  32. Jocher G. Ultralytics YOLOv5. 2020.

  33. Wang C-Y, Bochkovskiy A, Liao H-YM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

  34. Wang C-Y, Yeh I-H, Liao H-YM. Yolov9: learning what you want to learn using programmable gradient information. arXiv Preprint arXiv:2402.13616, 2024.

  35. Wang A et al. Yolov10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458, 2024.

  36. Li K, et al. A fast and lightweight detection algorithm for passion fruit pests based on improved YOLOv5. Comput Electron Agric. 2023;204:107534.

    Article  Google Scholar 

  37. Huang S, Liang X et al. Tea impurity detection algorithm based on improved YOLOv5 (in Chinese). Transactions of the Chinese Society of Agricultural Engineering 38.17 (2022).

  38. Woo, S.,. Cbam: Convolutional block attention module. in Proceedings of the European conference on computer vision (ECCV). 2018.

  39. Howard AG et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv Preprint arXiv:170404861, 2017.

  40. Sandler M et al. Mobilenetv2: Inverted residuals and linear bottlenecks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

  41. Howard A et al. Searching for mobilenetv3. in Proceedings of the IEEE/CVF international conference on computer vision. 2019.

Download references

Acknowledgements

Appreciations are given to the editors and reviewer of the Journal Plant Method.

Funding

This work was supported by the Natural Science Foundation of Liaoning Province (2023JH2/101300120),the Liaoning Provincial Natural Science Foundation Joint Fund (2023-BSBA-282) and the Doctoral Research Foundation of Shenyang Agricultural University (880423038).

Author information

Authors and Affiliations

Authors

Contributions

Q. C.: Methodology、Validation、Formal analysis、Experiment、Visualization、Investigation、Data curation、Writing– original draft、Writing– review & editing. D. Z.:Experiment、Software、Experiment、Visualization. J. L.:Experiment、Data curation、Validation、Formal analysis. J. L.:Experiment、Investigation、Conceptualization、VisualizationG. L.:Formal analysis、Investigation. S. F.:Formal analysis、Experiment、Data curation、Investigation、Funding acquisition、Project administration、Supervision. T. X.:Methodology、Validation、Investigation、Funding acquisition、Project administration、Supervision.

Corresponding author

Correspondence to Shuai Feng.

Ethics declarations

Ethics approval and consent to participate

Not applicable for that section.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Q., Zhao, D., Li, J. et al. Pyramid-YOLOv8: a detection algorithm for precise detection of rice leaf blast. Plant Methods 20, 149 (2024). https://doi.org/10.1186/s13007-024-01275-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13007-024-01275-3

Keywords