Exploring Dual Coupledness for Effective Pruning in Object Detection

Xiaohui, Guan; Wenzhuo, Huang; Yaguan, Qian; Xinxin, Sun

doi:10.1007/s11063-024-11697-8

Exploring Dual Coupledness for Effective Pruning in Object Detection

Open access
Published: 10 February 2025

Volume 57, article number 21, (2025)
Cite this article

Download PDF

You have full access to this open access article

Neural Processing Letters Aims and scope Submit manuscript

Exploring Dual Coupledness for Effective Pruning in Object Detection

Download PDF

Guan Xiaohui¹,
Huang Wenzhuo²,
Qian Yaguan² &
…
Sun Xinxin¹

686 Accesses
Explore all metrics

Abstract

Pruning offers an efficient approach to compressing models deployed on resource-constrained devices. In this paper, we introduce a novel method called Dual-Coupledness Object Detection Pruning (DCODP), specifically designed for object detection models. Taking into account the complexity of model coupling, our algorithm utilizes a depth-first search approach to identify interlayer coupling within the model. It then groups sublayers with the same parent layer together. Filters corresponding to feature maps with strong coupling are pruned within the layer, and the same pruning operation is applied to the corresponding indices in other coupled layers. In order to prove the validity of our method, extensive experiments are conducted on PASCAL VOC2007, PASCAL VOC2012 and MS COCO2017. The results show that our DCODP achieves a significant reduction of 50% in parameters and an average of more than 70% impressive score.

Pruning DETR: efficient end-to-end object detection with sparse structured pruning

Article 18 August 2023

Depthwise grouped convolution for object detection

Article 13 September 2021

Object detection network pruning with multi-task information fusion

Article 18 February 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Large models have demonstrated exceptional performance in object detection tasks. However, their deployment on resource-constrained mobile devices and embedded systems poses significant challenges. This has prompted the development of lightweight object detection models. Pruning has emerged as a highly effective approach for increasing model sparsity by eliminating redundant weights or filters. This results in lighter detectors that offer improved computational efficiency. In a prior study, Xie et al. [1] proposed an LCP algorithm by combining a pruning network with a localization-assisted perceptual network. However, this algorithm is only effective for region-based models and requires selected layers of pruning, limiting its broader applicability. On the other hand, Balasubramaniam et al. [2] introduced semi-structured pruning, which necessitates hardware library acceleration.

This paper focuses on enhancing the practicality of pruning algorithms for object detectors through structured pruning, i.e., filter pruning. Structured pruning poses unique challenges when applied to neural networks with complex internal structural coupling. Deep neural networks (DNNs) comprise interconnected modules, including convolution, activation, and normalization layers. These modules, whether parameterized or not, exhibit intricate connections. While classical pruning algorithms, as employed in VGG [3] and ResNet [4] models, have demonstrated satisfactory results, their effectiveness tends to diminish when applied to object detectors. This is because these algorithms only consider sparsity relationships within individual layers. For instance, the NetSlim [5] evaluates importance based on the scaling factor of the BN layer, but fails to account for scenarios where the use of the BN layer is restricted in object detectors. Similarly, He et al. [6] utilize Lasso regression to select pruned channels, but do not consider interlayer coupling, thereby disregarding the potential impact of removing filters in the current layer on subsequent channels.

Based on this observation, we propose Dual-Coupling Object Detection Pruning (DCODP), which leverages both interlayer and intralayer coupling relationships in object detectors. This algorithm applies to both regionalized detectors and integrated convolutional network detectors. First, we employ the Depth-First-Search (DFS) algorithm to calculate the interlayer coupling relationships within the model. This allows us to group sub-layers with the same parent layer into the same group. Furthermore, we compute the intralayer coupling by analyzing the linear expression relationship of the model’s deep features. The pruning process focuses on removing filters that exhibit high coupling within the layers. This involves performing the same pruning operation on filters that are indexed correspondingly in the coupled layers. Finally, fine-tuning is conducted to restore the model’s performance after pruning.

2 Related Work

2.1 Object Detection

Object detection is a fundamental and challenging task in computing and has been widely used in people’s lives, such as surveillance security [7, 8], autonomous driving [9, 10], robotic vision [11, 12], and medical decision-making [13, 14], etc., to simultaneously predict the location and class of the object in a given image. Existing object detection models fall into two main categories: two-stage detectors and one-stage detectors [15]. The main differences between these two are in the processing flow and performance. In terms of processing flow, the two-stage target detection algorithm first generates a series of candidate bounding boxes as samples and then classifies these samples using a neural network. Object Detector RCNN [16] is a pioneering work in object detection using deep learning. Due to the drawbacks of RCNN such as slow speed and large training space, Fast RCNN[17] was developed. The two improvements are that the whole image features are computed at once and then the candidate regions are extracted by ROI (Region of Interest) feature mapping, which does not need to be recalculated; and a fixed-size feature map is extracted from different-size candidate regions by ROI pooling. Different from RCNN and Fast RCNN, Faster RCNN [18] is an end-to-end learning framework introducing the Region Proposal Network (RPN). Faster RCNN is slower but has a very big improvement in accuracy. FPN [19] can detect objects with a wide variety of scales by building a feature pyramid to semantics at all scales.

The one-stage detection algorithms do not have a separate stage for candidate box generation but only need to regress the size, location, and class of the object directly through the convolutional network. Redmon et al.[20] developed a real-time object detector called YOLOv1, which has a much faster detection speed but its detection accuracy is often poor for small-sized targets. Subsequently the upgraded versions are being proposed, e.g. YOLOv2, YOLOv3, YOLOv4 etc.. YOLO series has been on the road to innovation and has had a wide-ranging impact on the computer vision field. By far the most effective is YOLOv8. SSD [21] was proposed in 2016, which improves the detection accuracy for small objects. In recent years the transformer is used in object detection and achieve state-of-the-art performance [22].

2.2 Pruning Techniques for Object Detection

Deep learning models generally have a large number of redundant parameters, eliminating the unimportant weights in the weight matrix can reduce the size and computational effort of the model. Pruning can be divided into unstructured pruning, structured pruning and semi-structured pruning basing on pruning targets and the resulting network structure [23]. Unstructured pruning remove specific parameters by applying a threshold to zero out parameters below it without considering its internal structure [24, 25]. Since it can remove weights anywhere, the irregular replacement of non-zero weights leads to actual acceleration requires the support of special software or hardware. Structured pruning removes entire filters, channels, neurons,or even layers. It can directly speed up networks and reduce the size of neural networks without the support of special hardware and software [26]. To improve the accuracy of structured pruning and achieve specific acceleration of unstructured pruning, semi-structured pruning is introduced in work [27]. Since the weights of network may vary for different input data, dynamic pruning is proposed by recoving weights at runtime to improve the representation of convolutional neural networks [28, 29].

Pruning techniques have been proven to be very effective in reducing the size or complexity of a object detection model [30]. Balasubramaniam et al. [2] introduced a new semi-structured pruning framework R-TOSS.It uses depth-first search to reduce the computational cost of iterative pruning to achieve model compression. Qi et al. [31] proposed a weighted pruning algorithm based on $L_1$ norm and mean difference, which enabled the lightweight YOLOv3-SPP model to have better performance in remote sensing object detection tasks. To promote the use of edge devices in water systems, Wang et al. [32] proposed a structured pruning algorithm based on the SSD model of the receptive field. Xie et al. [33] proposed a positioning-aware auxiliary network to find channels with key information for classification and regression, so that channel pruning of the object detection model can be directly performed. The above pruning method only targets specific object detection models, but its versatility will be greatly limited when it comes to various types of detection models.

3 Methods

3.1 Inter-layer Coupling

Classical pruning algorithms achieve satisfactory results on models like VGG [3] and ResNet [4], etc., but struggle with object detectors. This is due to their limited consideration of sparsity within layers, overlooking specific constraints imposed by the BN layer in object detectors. Our algorithm addresses this by incorporating interlayer coupling in complex object detection models. In Algorithm 1, we define interlayer coupling as sub-layers sharing the same parent layer, grouped together. Sub-layers without the same parent layer form a separate group. Layers within the same group are simultaneously removed from filters at the same index. Our algorithm utilizes the DFS algorithm to automatically identify coupling relationships between layers, improving efficiency and removing reliance on a specific layer like BN. A visual representation of the interlayer coupling process is illustrated in Fig. 1.

To improve the feature extraction capability of object detection models, researchers have proposed multi-branch structures. However, pruning such complex structures, as shown in Fig. 2a, has been challenging due to high coupling. Previous methods only pruned internal layers within residual blocks, resulting in reduced efficiency. By considering coupling, we can prune layers at both ends of residual or jump connections simultaneously. Figure 2b, c illustrate the ELAN and RepConv structures in YOLOv7. ELAN adjusts network depth without changing the width, controlling the shortest and longest gradient paths. By integrating convolutional blocks seamlessly, ELAN enables deeper learning and more efficient convergence. In our approach, we group sublayers with the same parent layer and prune filters with the same index across sublayers. The RepConv structure incorporates a 1$\times $1 convolutional branch and a constant mapping to the input 3$\times $3 convolution. It exhibits computational similarities to ELAN when analyzing coupling. Inspired by architectures like Inception and ResNet, RepConv enhances the network’s generalization capability and mitigates the issue of gradient vanishing.

Coupling is also observed in Feature Pyramid Networks (FPNs), as shown in Fig. 2d. Traditional neural networks like AlexNet and VGG use multiple downsampling operations, but they struggle to detect small objects. In object detection, models like SSD, RetinaNet, and YOLOv7 address this issue by employing FPNs [34,35,36]. FPNs excel at extracting features of different resolutions and scales, enabling accurate representations for objects of varying sizes. FPNs perform downsampling layer by layer, capturing detailed information in shallow layers and semantic information in deeper layers. Upsampling operations fuse multi-scale features and generate predictions for objects of different sizes. Analyzing the coupling within FPNs involves a layer-by-layer search to determine parent layers. Sublayers with the same parent layer exhibit coupling and should be grouped together.

3.2 Intra-layer Coupling

In filter pruning, a commonly adopted strategy is to identify important feature maps and retain the corresponding filters. Feature maps inherently reflect and capture crucial information and features from the input data and filters. Therefore, evaluating feature maps indirectly indicates the importance of the corresponding filters [37]. In our research, we investigate the significance of deep feature maps in object detection networks and define filter importance based on the linear correlation between different feature maps. To measure the importance of feature maps, we propose a pruning method called intralayer coupling. Within a convolutional layer, when a filter’s feature map can be linearly expressed by other feature maps, it indicates a strong coupling between them. In such cases, the information contained in the linearly expressed feature maps is considered redundant as it is already present in feature maps with higher information density. Removing the corresponding filters does not significantly impact the model’s representational power. Conversely, the feature maps generated by the retained filters should exhibit weak coupling with each other. The process of intralayer coupling pruning is illustrated in Fig. 3.

3.3 Filter Pruning for Object Detection Models

We formulate structured pruning, such as filter pruning, as:

$$\begin{aligned} \begin{aligned} \mathcal {L}(\mathcal {W};\mathcal {D})= \frac{1}{N} \sum _{i=1}^N \mathcal {L}(W;(x_i,y_i)), \end{aligned} \end{aligned}$$

(1)

where $\mathcal {L}(\cdot )$ is the loss function (e.g., cross-entropy loss) and $\mathcal {D}=\{(x_i,y_i)\}^N_{i=1}$ is the data set.

Based on Eq. (1), for the objective loss function that guides the filter and prunes by feature information, we define the loss function as follows:

$$\begin{aligned} \begin{aligned} \mathcal {L}_{prune}=\frac{1}{N} \sum _{i=1}^N \mathcal {L}(O;(x_i,y_i)), \end{aligned} \end{aligned}$$

(2)

where O is denoted as the set of all output feature maps of the model, N denotes the size of the dataset, the pruning ratio is set to p, $\frac{\Vert O\Vert }{N} \le p$, and the ratio of the total number of retained feature maps (filters) to the total number of feature maps is controlled to be within the range of p.

The object detection model has to realize two major functions, classification, and location regression, so we usually use $\mathcal {L}_{cls}$ to denote the classification loss and $\mathcal {L}_{reg}$ to denote the regression loss:

$$\begin{aligned} & \begin{aligned} \mathcal {L}_{cls}= \frac{1}{N^{\prime }} \sum _{i=1}^{N^{\prime }} \mathcal {L}(p_i;y_i), \end{aligned} \end{aligned}$$

(3)

$$\begin{aligned} & \begin{aligned} \mathcal {L}_{reg}= \frac{1}{N^{\prime }} \sum _{i=1}^{N^{\prime }} y_i \mathcal {L}(r_i;t_i), \end{aligned} \end{aligned}$$

(4)

where $N^{\prime }$ denotes the number of prediction frames, $p_i$ denotes the probability when the i-th prediction frame is predicted to be a positive sample, and $y_i$ denotes the label of the real object corresponding to the ith prediction frame, which is 1 for positive samples and 0 for negative samples. $r_i$ denotes the regression parameter of the prediction frame predicted by the ith prediction frame (e.g., the size of the prediction frame and the offset of the coordinates), $t _i$ denotes the size and coordinates of the real object corresponding to the ith prediction frame. Depending on the detector or scenario, the classification loss function usually uses the cross-entropy loss or the mean square error loss, and the regression loss usually uses the IoU loss or the GIoU loss.

Formulate the overall object detection filter pruning problem as minimizing the following joint loss function:

$$\begin{aligned} \begin{aligned} \mathcal {L}=\gamma (\mathcal {L}_{cls}+\mathcal {L}_{reg})+(1-\gamma )\mathcal {L}_{prune}, \end{aligned} \end{aligned}$$

(5)

where $\gamma $ plays a crucial role as a trade-off hyperparameter in balancing pruning accuracy and detection accuracy.

3.4 Coupling of Feature Maps

To find the correlation between feature maps, the first thing that comes to mind is the rank, which measures the correlation between row vectors/column vectors in a matrix. To facilitate the linear relationship analysis of the feature map, our idea is to flatten the feature matrix from a two-dimensional vector $h \times \omega $ to a one-dimensional vector $h*\omega $, and then traverse the linear correlations between different feature vectors one by one. First, the expression of the lth layer output feature map is given by $ O^{l} = \left\{ o_{1}^{l},o_{2}^{l},...,o_{N_{l}}^{l} \right\} \in \mathbb {R}^{N_{l} \times h_{l} \times \omega _{l}}$-vectorized to Eq. (6):

$$\begin{aligned} \begin{aligned} V^l=\left\{ v^l_1,v^l_2,...,v^l_{N^l} \right\} \in \mathbb {R}^{N_{l} \times h_{l} \omega _{l}}. \end{aligned} \end{aligned}$$

(6)

We then use the discreteness of the rank of the two matrices to define the coupling of the feature maps produced by the ith filter of the lth convolutional layer:

$$\begin{aligned} \begin{aligned} CP(v^l_i)=Rank(V^l)-Rank(M^l_i \odot V^l), \end{aligned} \end{aligned}$$

(7)

where Rank(.) denotes the rank of this matrix, $ \odot $ is the Hadamard product and $M^l_i$ is the row mask vector where the ith row element is 0 but the other elements are 1. The size of $CP(v^l_i)$ then reflects the strength of the coupling of the feature map produced by the ith filter to the other feature maps.

Since rank is non-convex and difficult to solve in optimization problems, we need to find a convex approximation to it, which is the kernel paradigm number. Suppose the kernel norm of a matrix A is $\Vert A^l \Vert _*$, then $\Vert A^l \Vert _*$ is the sum of the absolute values of the Singular Values of A:

$$\begin{aligned} \begin{aligned} \Vert A^l \Vert _*=tr(\sqrt{A^TA}), \end{aligned} \end{aligned}$$

(8)

Let $A=U \sum V^T$, then the singular value decomposition is expressed as:

$$\begin{aligned} \begin{aligned} {tr( \sqrt{A^TA})}&{= tr\left( \sqrt{\left( {U\Sigma V^{T}} \right) ^{T}U\Sigma V^{T}} \right) }. \\ \end{aligned} \end{aligned}$$

(9)

Let $f_x(A) = \Vert Ax \Vert _p (p\ge 1)$, then $f_x$ is convex, so $\Vert A \Vert _{p} = \mathop {sup}_{\Vert x \Vert _p = 1}f_{x}(A)$ is convex, and in particular, $\Vert A \Vert _2$ convex. Since $\Vert A \Vert _2$ and $\Vert A \Vert _*$ are dyadic paradigms, $\Vert A \Vert _* = \mathop {sup}_{\Vert x \Vert _2 = 1}tr(A^TX)$ and $\Vert A \Vert _*$ is convex. Therefore the kernel paradigm of A is convex.

Next, we borrowed the idea of the kernel paradigm number [38]. We used the distance between the kernel paradigm numbers of two matrices as a measure of the dissimilarity or similarity between them. If the distance between the kernel paradigms of two matrices is small, it means that they somehow have similar low-rank structures or their main features are similar. Conversely, if the distance between the kernel paradigms is large, it indicates that the two matrices are more different in terms of their low-rank structures. Therefore, the coupling (Coupling) of the feature maps produced by the ith filter of the lth convolutional layer is again defined:

$$\begin{aligned} \begin{aligned} CP(v^l_i)=\Vert V^l \Vert _*-\Vert M^l_i \odot V^l \Vert _*, \end{aligned} \end{aligned}$$

(10)

where $\Vert . \Vert _*$ is the kernel paradigm number. Similarly, a smaller $CP(v^l_i)$ indicates that there is not much variability in the model after pruning, which implies that the feature map $o^l_i$ is not important, i.e., it is more coupled to other feature maps, and thus the corresponding filter can be safely removed. On the contrary, the feature map $v^l_i$ is weakly coupled to other feature maps, which contain higher density information, and the corresponding filter should be retained.

Equation (10) only measures the coupling of a single feature map, and traversing all feature maps is more complex and difficult to manipulate, so we use first calculate the coupling of each feature map via Eq. (10), and then select the smallest m (feature map) filters as the final ones to be removed:

$$\begin{aligned} \begin{aligned} CP\left( \left\{ v^l_{n} \right\} ^{m}_{n=1}\right) =\Vert V^l \Vert _*-\Vert M^l_{1,...,m} \odot V^l \Vert _*, \end{aligned} \end{aligned}$$

(11)

where $M^l_{1,...,m}$ is a multi-row mask matrix where the 1, ..., m row vectors are 0 and the other vectors are 1. The intralayer coupling pruning algorithm is shown in Algorithm 2.

4 Experiment

4.1 Experimental Settings

Our experiments are implemented with PyTorch1.4, and conducted on a server with four 2080ti GPUs running Ubuntu16.04.6 LTS.

Datasets and Models: In this section, we execute the DCODP algorithm on representative integrated convolutional neural network detection models YOLOv7 and YOLOv8, as well as the two-stage detection model Faster RCNN, and evaluate the pruning models using the validation sets of MS COCO2007, PASCAL VOC2017, and PASCAL VOC2012. textbfTraining Settings: Local pruning was used as the pruning strategy and three representative pruning ratios of 20%, 50%, and 70% were used. On MS COCO2017, we trained for 150 cycles and fine-tuned for 20 cycles. On PASCAL VOC2017 and PASCAL VOC2012, we train 100 cycles and fine-tune 20 cycles. For the training, validation, and testing phases, the image size was uniformly 640 640. For the validation phase, the confidence threshold was 0.001 and the IoU threshold was 0.6. for the testing phase, the confidence threshold was 0.25 and the IoU threshold was 0.45. using AdamW as the optimizer, the momentum value was 0.937, the weight decay value was 0.0005, and the learning rate was 0.001. In particular, for YOLOv8, the classification loss coefficient is 0.5, the regression loss coefficient box is 7.5, and dfl is 1.5.

Performance Metrics: We present the highest accuracy achieved by the pruned model on the MS COCO2007 and PASCAL VOC2017 datasets. The evaluation metrics used for object detection include $\text {AP}_{50}$, which measures the average accuracy of the model when the Intersection over Union (IoU) is greater than or equal to 0.5 for various object classes. Similarly, $\text {AP}_{75}$ is a more stringent metric that requires the average accuracy of the model when IoU is greater than or equal to 0.75. The $\text {AP}$ metric represents the average accuracy across different IoU thresholds, ranging from 0.5 to 0.95 in increments of 0.05. A higher AP indicates better performance of the model at different IoU thresholds. Additionally, we provide separate average accuracies, namely $\text {AP}_{S}$, $\text {AP}_{M}$, and $\text {AP}_{L}$, for small, medium, and large-sized objects, respectively.

4.2 Pruning Results for PASCAL VOC2007

Based on the results presented in Table 1, we conclude the evaluation conducted on the PASCAL VOC2007 validation set. In comparison to the pre-trained models, the three models exhibit minimal accuracy degradation at pruning ratios of 20% and 50%. Moreover, even at the higher pruning ratio of 70%, the accuracy remains within an acceptable range, thereby demonstrating the effectiveness of the DCODP framework. After pruning, YOLOv7 performs well across most categories, with only a slight drop in recognition accuracy for categories such as tables, chairs, and airplanes. On the other hand, the pruned YOLOv8 struggles in recognizing categories such as birds, tables, chairs, and airplanes. Similarly, Faster RCNN proves to be ineffective in accurately detecting bottles and airplanes. Comparing the three models after pruning, it is evident that YOLOv7 is less sensitive to the reduction in the number of parameters compared to YOLOv8 and Faster RCNN. This observation is supported by Fig. 4, where the detection performance of YOLOv7 after pruning remains consistently good and, in most cases, superior to that of YOLOv8 and Faster RCNN.

Table 1 YOLOv7, YOLOv8, and Faster RCNN based on PASCAL VOC2007 at different pruning ratios $\text {AP}_{50}$

Full size table

4.3 Pruning Results for PASCAL VOC2012

On the higher complexity PASCAL VOC2012 dataset, our pruning algorithm still achieves the desired results, which are shown in Table 2. However, the object detection model itself does not have high detection accuracy for small objects, such as birds, bottles, airplanes, and other categories. After the algorithm of DCODP, at lower pruning ratios, such as 20% and 50%, the detection accuracy for small objects is not weakened to a great extent, while at a high pruning ratio of 70%, a high degree of degradation occurs. Figure 5 demonstrates the observation. This also motivates us to pay more attention to the recognition accuracy of small objects in the pruning algorithm at a later stage. Overall, the DCODP algorithm has been validated on the PASCAL VOC dataset with great success.

Table 2 YOLOv7, YOLOv8, and Faster RCNN based on PASCAL VOC2012 at different pruning ratios $\text {AP}_{50}$

Full size table

4.4 Pruning Results for MS COCO2017

In this section, we present an evaluation of the pruned detection model using the MS COCO2017 dataset. The results, as shown in Table 3, demonstrate that our method achieves superior performance, further confirming the effectiveness of our approach. It is noteworthy that the DCODP algorithm yields greater improvements for medium-sized and large-sized objects. Figure 6 demonstrates the observation. Additionally, the higher the IoU threshold, the more significant the enhancement achieved by our method. This observation highlights the fact that our pruning method enables the model to learn intricate features even when confronted with a large dataset and complex architecture and parameters inherent in the detection model.

Table 3 YOLOv7, YOLOv8, and Faster RCNN based on MS COCO2017 at different pruning ratios $\text {AP}_{50}$

Full size table

4.5 Ablation Experiment

Intralayer coupling and interlayer coupling are two crucial factors in the DCODP algorithm. To demonstrate the validity of Intralayer and interlayer coupling, we incorporated them into different pruning methods, such as Slimming, ThiNet, and HRank. Table 4 shows the result of comparing with DCODP. From the table, we can deduce that Our approach not only reduces the number of parameters and enhances computational speed but also significantly contributes to detection accuracy. When comparing the algorithm solely based on intralayer coupling without considering interlayer coupling with DCODP, it is evident that DCODP achieves the best mAP and speed. This finding validates the effectiveness of the DCODP algorithm.

Table 4 YOLOv7, YOLOv8, and Faster RCNN based on MS COCO2017 at different pruning ratios $\text {AP}_{50}$

Full size table

4.6 Hyperparametric Experiment

The pruning ratio is a crucial hyperparameter in the DCODP algorithm. We investigate the corresponding $\text {mAP}_{50}$ and $\text {mAP}_{50-95}$ for YOLOv7, YOLOv8, and Faster RCNN models at different pruning ratios. Here, $\text {mAP}_{50}$ represents the mean average precision at an IoU threshold of 0.5, while $\text {mAP}_{50-95}$ represents the mean average precision across a range of IoU thresholds from 0.5 to 0.95. As illustrated in Fig. 7, the trends of $\text {mAP}_{50}$ and $\text {mAP}_{50-95}$ remain consistent. Between the low pruning ratios of 10% and 50%, the detection performance of the model improves or remains stable. This is because, after removing redundant parameters, the model focuses more on efficiently expressing the remaining filters, leading to an improved fitting effect. However, between the high pruning ratios of 50% and 70%, the accuracy of the model decreases. This is due to the removal of a significant number of parameters, resulting in a loss of fitting effect while maintaining high compression. Nevertheless, the overall detection performance of the model still falls within an acceptable range.

5 Conclusion

This paper focuses on lightweight object detection models and real-time detection. We propose a dual-coupling object detection pruning algorithm that considers inter/intralayer coupling relations. Our method improves the efficiency of the pruning framework by leveraging interlayer coupling. Extensive experiments validate its effectiveness. In the future, we aim to enhance our method further by optimizing the pruning algorithm and exploring additional techniques. Our goal is to improve the efficiency and accuracy of object detection models, contributing to the advancement of lightweight and real-time detection.

References

Xie Z, Zhu L, Zhao L, Tao B, Liu L, Tao W (2020) Localization-aware channel pruning for object detection. Neurocomputing 403:400–408
Article MATH Google Scholar
Balasubramaniam A, Sunny F, Pasricha S (2023) R-toss: a framework for real-time object detection using semi-structured pruning. In: 2023 60th ACM/IEEE design automation conference (DAC), pp 1–6 . IEEE
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks
Fan Q, Brown L , Smith J (2016) A closer look at faster r-cnn for vehicle detection. In: 2016 IEEE intelligent vehicles symposium (IV), pp 124–129 . IEEE
Fu Z, Chen Y, Yong H, Jiang R, Zhang L, Hua X-S (2019) Foreground gating and background refining network for surveillance object detection. IEEE Trans. Image Process. 28(12):6077–6090
Article MathSciNet Google Scholar
Geiger A, Lenzp UR (2012) Are we ready for autonomous driving. In: Proceedings of 2012 IEEE conference on computer vision and pattern recognition
Dai X (2019) Hybridnet: a fast vehicle detection system for autonomous driving. Signal Process. Image Commun. 70:79–88
Article MATH Google Scholar
Rad M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529
Jaeger PF, Kohl SA, Bickelhaupt S, Isensee F, Kuder TA, Schlemmer H-P, Maier-Hein KH (2020) Retina u-net: embarrassingly simple exploitation of segmentation supervision for medical object detection. In: Machine learning for health workshop, pp 171–183. PMLR
Lee S-g, Bae JS, Kim H, Kim JH, Yoon S (2018) Liver lesion detection from weakly-labeled multi-phase ct volumes with a grouped single shot multibox detector. In: medical image computing and computer assisted intervention–MICCAI 2018: 21st international conference, Granada, Spain, September 16-20, Proceedings, Part II 11, pp 693–701 (2018). Springer
Zou Z, Chen K, Shi Z, Guo Y, Ye J (2023) Object detection in 20 years: a survey
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural inf Process Syst 28:1137
MATH Google Scholar
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer
Carion N, Massa F (2020) End-to-end object detection with transformers. In: Proceedings of the 16th European conference on computer vision, pp 213–229
Zhu X, Li J, Liu Y, Ma C, Wang W (2023) A survey on model compression for large language models
Lee N, Ajanthan T, Torr PHS (2018) Snip: single-shot network pruning based on connection sensitivity
Wang C, Zhang G, Grosse R (2020) Picking winning tickets before training by preserving gradient flow
Huang Z, Wang N (2017) Data-driven sparse structure selection for deep neural networks
Ma X, Guo F-M, Niu W, Lin X, Tang J, Ma K, Ren B, Wang Y (2020) PCONV: the missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices
Gao X, Zhao Y, Dudziak L, Mullins R, Xu CZ (2018) Dynamic channel pruning: feature boosting and suppression
Lin J, Rao Y, Lu J, Zhou J (2017) Runtime neural pruning. In: Neural information processing systems
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets
Qi C. Zhao, Chen L (2023) Object detection compression model of remote sensing image based on yolov3-spp. J Signal Process 39(9):2193
MathSciNet MATH Google Scholar
Wang G (2023) Research on compression method of object detection model for smart water
Xie Z, Zhu L, Zhao L, Tao B, Liu L, Tao W (2020) Localization-aware channel pruning for object detection. Neurocomputing 403:400–408
Article MATH Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision – ECCV 2016, pp 21–37
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), pp 2999–3007
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) HRank: filter pruning using high-rank feature map
Sui Y, Yin M, Xie Y, Phan H, Zonouz S, Yuan B (2022) CHIP: channel independence-based pruning for compact neural networks

Download references

Acknowledgements

This work was supported by the Key Program of Zhejiang Provincial Natural Science Foundation of China (LZ22F020007), the Joint Funds of the Zhejiang Provincial Natural Science Foundation of China (LZJWZ23E090001).

Author information

Authors and Affiliations

School of Computer Science and Technology, Zhejiang University of Water Resources and Electric Power, Hangzhou, 310018, Zhejiang, China
Guan Xiaohui & Sun Xinxin
School of Science, Zhejiang University of Science and Technology, Hangzhou, 310023, Zhejiang, China
Huang Wenzhuo & Qian Yaguan

Authors

Guan Xiaohui
View author publications
Search author on:PubMed Google Scholar
Huang Wenzhuo
View author publications
Search author on:PubMed Google Scholar
Qian Yaguan
View author publications
Search author on:PubMed Google Scholar
Sun Xinxin
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Qian Yaguan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xiaohui, G., Wenzhuo, H., Yaguan, Q. et al. Exploring Dual Coupledness for Effective Pruning in Object Detection. Neural Process Lett 57, 21 (2025). https://doi.org/10.1007/s11063-024-11697-8

Download citation

Accepted: 09 September 2024
Published: 10 February 2025
DOI: https://doi.org/10.1007/s11063-024-11697-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Exploring Dual Coupledness for Effective Pruning in Object Detection

Abstract

Similar content being viewed by others

Pruning DETR: efficient end-to-end object detection with sparse structured pruning

Depthwise grouped convolution for object detection

Object detection network pruning with multi-task information fusion

Explore related subjects

1 Introduction

2 Related Work

2.1 Object Detection

2.2 Pruning Techniques for Object Detection

3 Methods

3.1 Inter-layer Coupling

3.2 Intra-layer Coupling

3.3 Filter Pruning for Object Detection Models

3.4 Coupling of Feature Maps

4 Experiment

4.1 Experimental Settings

4.2 Pruning Results for PASCAL VOC2007

4.3 Pruning Results for PASCAL VOC2012

4.4 Pruning Results for MS COCO2017

4.5 Ablation Experiment

4.6 Hyperparametric Experiment

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords