GhostFaceNet++: boosting efficiency and accuracy via CSP bottlenecks and Channel Attention

Nachet, Randa; Garrigós, Javier; Stambouli, Tarik Boudghene

doi:10.1007/s11554-025-01768-x

GhostFaceNet++: boosting efficiency and accuracy via CSP bottlenecks and Channel Attention

Research
Open access
Published: 13 October 2025

Volume 22, article number 201, (2025)
Cite this article

You have full access to this open access article

Download PDF

Journal of Real-Time Image Processing Aims and scope Submit manuscript

GhostFaceNet++: boosting efficiency and accuracy via CSP bottlenecks and Channel Attention

Download PDF

Randa Nachet¹,
Javier Garrigós² &
Tarik Boudghene Stambouli¹

734 Accesses
Explore all metrics

Abstract

Deploying accurate face recognition systems on resource-constrained devices remains challenging due to the high computational demands of many state-of-the-art models. This paper proposes GhostFaceNet++, an improved lightweight framework that enhances the GhostFaceNet architecture to achieve both efficiency and accuracy. The framework incorporates a Cross Stage Partial (CSP) structure into the GhostNet bottleneck to enrich feature representation and reduce computational redundancy, while an Efficient Channel Attention (ECA) mechanism is integrated into the classification head to enhance discriminative feature learning. Extensive experiments on standard benchmarks (LFW, CFP-FP, AgeDB-30, CP-LFW) demonstrate that GhostFaceNet++ consistently outperforms the original GhostFaceNet and achieves competitive performance with state-of-the-art lightweight models. For example, the ECA-CSP GhostFaceNet V1-2 variant improves AgeDB-30 accuracy from 89.80 to 90.67% when trained on CASIA-WebFace, while reducing FLOPs from 60.3 to 51.6M and model size from 8.13 to 7.51 MB. GhostFaceNet++ operates in the highly efficient 51–62 MFLOPs range, outperforming the original GhostFaceNet (60–275 MFLOPs) and being significantly more efficient than MobileFaceNet (439.7 MFLOPs). These results confirm that the integration of CSP and ECA enables GhostFaceNet++ to strike a favorable balance between compactness and performance, advancing the design of lightweight face recognition systems for real-world deployment on resource-constrained devices.

FGENet: a lightweight facial expression recognition algorithm based on FasterNet

Article 03 June 2024

Towards efficient and robust face recognition through attention-integrated multi-level CNN

Article Open access 05 June 2024

Attention-augmented squeeze-and-excitation enhanced mobile network for occluded facial expression recognition in resource-constrained environments

Article 05 June 2025

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Face recognition has become a core technology in numerous real-world applications, including biometric authentication on mobile devices, surveillance in embedded systems, human–-computer interaction, and social media. Recent advances in deep learning, particularly with deep convolutional networks and transformer-based models, have significantly improved face recognition accuracy. However, these models are typically computationally intensive and memory-heavy, limiting their deployment on resource-constrained platforms such as smartphones, IoT devices, and edge surveillance systems. This drive towards creating efficient, lightweight deep learning models is a critical research trend not only in face recognition but across a diverse range of computer vision applications, from industrial instrumentation and cultural heritage preservation to other biometric modalities like finger vein recognition [21,22,23].

Face recognition systems face several challenges, including variations in pose, illumination, expression, occlusion, and aging, which can significantly degrade performance, particularly in unconstrained environments. Consequently, there is a growing need for lightweight models that balance accuracy and efficiency to enable real-time face recognition on edge devices. While several lightweight architectures have been proposed to reduce computational costs, they still exhibit notable limitations. Specifically, Light CNN has a relatively large number of parameters, limiting efficiency on resource-constrained devices; MobileFaceNet suffers from high FLOPs, increasing computational demands; and the original GhostFaceNet provides insufficient feature representation for challenging scenarios. Addressing these limitations requires careful architectural design, optimized activations, and mechanisms that enhance feature representation while maintaining low computational cost.

In this paper, we propose a lightweight and efficient face recognition model that improves upon the GhostFaceNet architecture. Specifically, we redesign the GhostNet bottleneck using the Cross Stage Partial (CSP) structure to enhance gradient flow and reduce redundant computation. In addition, we integrate the Efficient Channel Attention (ECA) mechanism into the network’s head to selectively emphasize informative features. These enhancements enable our model to achieve competitive accuracy with fewer parameters and a smaller memory footprint, making it suitable for real-time applications on resource-constrained devices.

In summary, the main contributions of this work are as follows:

1.
A novel lightweight face recognition architecture is proposed, in which the GhostNet bottleneck module is redesigned using a Cross Stage Partial (CSP) structure to enhance feature diversity and representation efficiency.
2.
The Efficient Channel Attention (ECA) mechanism is integrated into the head of the network to adaptively emphasize informative channels and improve the extraction of discriminative facial features.
3.
An improved trade-off between recognition accuracy and computational efficiency is achieved, with the proposed model outperforming the original GhostFaceNet in terms of accuracy, FLOPs, and model size.
4.
A rigorous evaluation is conducted using standard face recognition benchmarks, showcasing its potential to advance the state-of-the-art in efficient face recognition.
5.
Real-time inference capability is demonstrated by benchmarking the proposed models on an ARM-based Android device, confirming their suitability for deployment in resource-constrained environments.

The remaining sections of the paper are organized as follows: In Sect. 2, we review previous lightweight models and explore the integration of attention mechanisms within face recognition models. Section 4 details our proposed method, outlining each module comprehensively. Following this, Sect. 4 presents and interprets the results of our experiments. Finally, Sect. 5 provides a summary of the key conclusions derived from our study.

2 Related works

2.1 Evolution of lightweight face recognition techniques

The development of lightweight face recognition models has been driven by the need to deploy deep learning on resource-constrained devices. Early efforts focused on creating efficient CNN architectures that could reduce computational complexity while preserving accuracy.

An initial approach by Wu et al. [29] introduced the Light CNN family, which utilized the Max-Feature-Map (MFM) activation function to simultaneously perform non-linear transformation and feature selection. While effective, the heavier variants remained demanding, with the largest model, Light CNN-29, containing approximately 12.6M parameters and requiring 3.9 GFLOPs. Concurrently, architectures explicitly tailored for mobile use emerged. MobileFaceNet [5], based on the MobileNetV2 backbone [19], employed inverted residual bottlenecks to achieve a lean profile of approximately 1M parameters and 439.7 MFLOPs. Inspired by similar principles, ShuffleFaceNet [16] adopted the group convolutions and channel shuffling from ShuffleNetV2 [15], offering scalable variants with computational complexity ranging from 66.9 MFLOPs to 1.05 GFLOPs.

To further push the boundaries of efficiency, subsequent research explored architectural modifications and novel building blocks. Xu et al. [31] proposed Fast-FaceNet, which replaced the complex GoogLeNet backbone of FaceNet with the more efficient MobileNet, resulting in variants with computational costs between 392M and 684M FLOPs. More recently, Hoo et al. [8] presented ConvFaceNeXt, which introduced an Enhanced ConvNeXt (ECN) block to significantly reduce FLOPs by repositioning the depthwise convolution. Its variants operate in the 390–410 MFLOPs range.

Despite these architectural advances, a fundamental challenge remained: lightweight models often struggled to capture sufficiently discriminative features for challenging real-world conditions. This limitation motivated the integration of attention mechanisms, which enhance the representational capacity of models by guiding them to focus on the most salient facial regions without adding significant computational overhead.

The integration of attention marked the next phase in the evolution of lightweight models. Li et al. [12] introduced AirFace, which improved upon MobileFaceNet by embedding the Convolutional Block Attention Module (CBAM) into each bottleneck, with an estimated complexity of around 1000 MFLOPs. Similarly, Yan et al. [32] presented VarGFaceNet, which combined Squeeze-and-Excitation (SE) blocks with variable group convolutions and knowledge distillation, resulting in a model with approximately 5M parameters and 1,022 MFLOPs.

A highly efficient architecture in this domain is GhostFaceNet [1], introduced in 2023 and derived from the GhostNet family [7, 24]. It reduces feature map redundancy via inexpensive linear operations and embeds SE blocks within its bottlenecks, achieving a state-of-the-art efficiency range of 60–275 MFLOPs. More recently, Aly Khalifa et al. [11] proposed RobFaceNet, which strategically integrates both Channel Attention (CA) and SE modules at different depths within a MobileFaceNet-inspired framework, achieving a compact 1.90M parameters and 337.3 MFLOPs.

While these methods have advanced the field, persistent gaps remain. Many models are still too computationally expensive for practical deployment; for example, Light CNN-29 (12.6M parameters, 3.9 GFLOPs) is too large for embedded devices, while MobileFaceNet (439.7 MFLOPs) carries a significant computational load. Even the highly efficient GhostFaceNet (60–275 MFLOPs) may suffer from architectural limitations, as its bottleneck design can produce redundant features and constrain discriminative capacity.

2.2 Attention mechanisms in lightweight models

The evolution of attention in lightweight models began with the widespread adoption of the SE block [9], which adaptively recalibrates channel-wise feature responses. This approach is central to many effective models for both face recognition (e.g., VarGFaceNet [32], GhostFaceNet [1]) and the related task of facial expression recognition (e.g., the recent RS-Xception [14]). These works collectively demonstrate that even simple CA can yield significant performance gains. Building upon SE, the CBAM [28] introduced an additional spatial dimension of attention, enabling networks such as AirFace [12] to capture both channel and spatial dependencies. More recently, this hybrid design has been extended further in works like PH-CBAM [13], which employs parallel feature-extraction branches alongside CBAM to enrich facial representations.

However, these advanced attention mechanisms often come at the cost of increased architectural complexity and computational overhead. To address these challenges, we adopt the ECA module [26]. Unlike SE, ECA avoids dimensionality reduction and instead captures local cross-channel interactions using a lightweight 1D convolution, introducing only a handful of additional parameters. This design offers a compelling balance between accuracy and computational efficiency, aligning directly with the objectives of our GhostFaceNet++ framework. By enhancing channel-wise feature discrimination with minimal cost, ECA represents an ideal choice for achieving state-of-the-art performance in real-time, embedded face recognition.

The preceding analysis of the state-of-the-art highlights several key challenges: reducing computational overhead, eliminating architectural redundancies, and enhancing feature discrimination without compromising efficiency. Our work directly addresses these issues through two targeted innovations. First, to mitigate redundant computation and the representational bottleneck identified in GhostFaceNet, we introduce a CSP structure into the bottleneck, which enriches feature flow and minimizes computational waste. Second, to improve channel-wise feature discrimination with ultra-low overhead, we incorporate the ECA mechanism into the classification head. By systematically addressing these gaps in the literature, the proposed GhostFaceNet++ framework achieves a superior balance between accuracy and efficiency, enabling practical deployment on resource-constrained devices.

3 Proposed approach

This section details our enhanced GhostFaceNet architectures, which improve face recognition performance through two key modifications: a redesigned Ghost bottleneck module employing the Cross Stage Partial (CSP) strategy, and a classification head enhanced with ECA. We first review the original GhostFaceNet architectures, and then introduce our contributions that address their limitations in computational efficiency and discriminative feature learning.

3.1 GhostFaceNet architectures

GhostFaceNet architectures are built upon the GhostNetV1 and GhostNetV2 architectures, which utilize Ghost modules to efficiently generate feature maps. GhostNetV1 employs Ghost modules, shown in Fig. 1c, to reduce feature map redundancy by generating a portion of feature maps with low-cost linear operations, specifically depthwise convolution (DWConv). This method reduces both the parameters and the computational complexity compared to the classic convolutional layers. GhostNetV2 expands on GhostNetV1 by incorporating an attention mechanism, the DFC attention illustrated in Fig. 1d, to capture long-range dependencies, further enhancing its representational capacity.

Moreover, in GhostNetV1, the bottleneck is composed of Ghost modules and SE blocks, followed by a depthwise convolution. In contrast, the GhostNetV2 bottleneck multiplies the output of the Ghost modules by a DFC attention map through element-wise multiplication, followed by depthwise convolution and a depthwise shortcut connection to enhance feature representation, as shown in Fig. 2a and c for V1 and V2, respectively. The original GhostFaceNets further modifies the GhostNet architectures by replacing ReLU with PReLU as the activation function and adjusting the SE module by replacing the conventional FC layers in the SE modules with convolution layers, as depicted in Fig. 1b.

3.2 Cross Stage Partial strategy

The Cross Stage Partial (CSP) approach, originally proposed in Ref. [25] for backbone networks, improves learning efficiency by addressing vanishing gradients and redundant computations through feature map partitioning and cross stage merging. This strategy diversifies gradient flow while reducing memory consumption and computational cost. Unlike previous implementations in ResNet or DenseNet architectures, we uniquely adapt CSP principles to Ghost bottlenecks, a novel contribution that enhances feature propagation while maintaining computational efficiency. This targeted adaptation improves gradient propagation through Ghost bottlenecks while retaining the model’s overall lightweight characteristics, making it particularly suitable for resource-constrained deployment.

3.3 Architecture improvements: CSP-Ghost bottleneck and ECA-enhanced head design

To improve the efficiency and representation power of the Ghost bottleneck, we first incorporate the CSP technique. Specifically, the input feature maps are divided into two parts: one part is processed through a series of Ghost modules, i.e., the Ghost path, while the other part passes through a lightweight transition path consisting of $1 \times 1$ convolutions. This split-and-merge strategy enhances the diversity of feature representations and reduces redundant gradient flow, in line with the objectives of CSPNet. After feature fusion via concatenation, a final $1 \times 1$ convolution is employed to unify the output channels. In addition, a depthwise separable shortcut connection is used to preserve identity mapping when the input and output dimensions differ or when spatial downsampling is required. Complementing the bottleneck enhancements, we strengthen the overall model by integrating the ECA mechanism into the head of the network. By placing ECA at the final stage, we enable adaptive channel-wise feature calibration with negligible computational overhead. The ultra-low cost of ECA stems from its fundamental design. Unlike the widely used SE block, which employs two fully connected layers whose parameter count scales with the number of channels, ECA avoids this complexity. Instead, it captures local cross-channel interactions via a lightweight 1D convolution with a small, fixed kernel size (e.g., $k=3$). This choice makes ECA exceptionally parameter-efficient, since its cost is constant and independent of the channel dimension, enabling effective CA with negligible overhead.

The overall architecture of the proposed CSP-Ghost bottleneck for both V1 and V2 improvements is illustrated in Fig. 2b and d, respectively, while the ECA-integrated head design is shown in Fig. 3. This design achieves a strong balance between computational efficiency and representational power, making it well-suited for lightweight face recognition tasks.

4 Experiments and results

4.1 Datasets

In this study, the proposed lightweight face recognition model is trained using the CASIA-WebFace and UMDFaces datasets, and its performance is evaluated on a wide range of challenging benchmarks. The chosen datasets reflect diverse, unconstrained conditions to ensure a rigorous assessment of generalization across pose, age, illumination, and occlusion variations. Specifically, CASIA-WebFace consists of Internet-collected images gathered under highly unconstrained conditions, including both indoor and outdoor scenes. It exhibits significant variability in pose, facial expression, and illumination, making it a standard large-scale dataset for training deep face recognition models. Similarly, UMDFaces contains web-scraped face images captured in diverse real-world environments, with extensive pose variation, natural lighting changes, and frequent occlusions, further contributing to robust model training.

For evaluation, we selected complementary benchmarks targeting different challenges. LFW provides general “in-the-wild” verification with diverse but relatively controlled difficulty. To assess robustness to aging and pose, we use AgeDB-30, CA-LFW, CP-LFW, and CFP-FP, which emphasize cross-age and cross-pose verification scenarios. Finally, IJB-B and IJB-C represent the most challenging real-world benchmarks, consisting of photographs and video frames collected under mixed indoor and outdoor conditions, with large variations in pose, illumination (e.g., bright sunlight, low-light, shadowed), and occlusion. These datasets closely simulate realistic deployment environments for face recognition systems.

Table 1 provides an overview of all training and evaluation datasets, including their scale and the primary challenges they represent.

Table 1 Face recognition datasets: training and evaluation

Full size table

4.2 Implementation details

All experiments were conducted on a workstation running Windows 10, equipped with an Intel Core i5-10400F CPU and a NVIDIA GeForce RTX 3060 GPU. The models were implemented using Python 3.10 with the Keras framework. Training was performed for 50 epochs.

To enhance discriminative feature learning, the models were trained using the ArcFace loss function [6]. Optimization was carried out using the Stochastic Gradient Descent (SGD) optimizer with an initial learning rate of 0.1. To prevent overfitting, we employed random horizontal flipping for data augmentation and applied $L_2$ regularization, which adds a penalty term proportional to the squared magnitude of the weights:

$$\begin{aligned} L_{\text {total}} = L_{\text {ArcFace}} + \lambda \sum _{i} \Vert w_i \Vert _2^2, \end{aligned}$$

(1)

where $L_{\text {ArcFace}}$ is the ArcFace loss, $w_i$ denotes the model parameters, and $\lambda$ is the regularization coefficient. The batch size was set to 256, and the detailed network hyperparameters are summarized in Table 2.

Table 2 Hyperparameters used for training

Full size table

4.3 Results

4.3.1 Performance on CASIA-WebFace and UMDFaces datasets

Table 3 presents the evaluation results of the proposed models trained on the CASIA-WebFace dataset. For a fair comparison, our baseline consists of GhostFaceNet (V1-2 and V2-2 variants), which were trained under identical conditions as the proposed models.

Table 3 Evaluation of the proposed CSP GhostFaceNet and ECA-CSP-GhostFaceNet models compared to the original GhostFaceNet baselines (V1-2 and V2-2, representing version-stride combinations)

Full size table

We first introduced the CSP strategy into the GhostFaceNet bottleneck. Three CSP ratios (0.25, 0.5, and 0.75) were evaluated to study the trade-off between efficiency and accuracy. The model with a CSP ratio of 0.5 achieved the best balance, offering a noticeable reduction in computational cost while slightly improving performance on most benchmarks. For instance, compared to the baseline V1-2, the CSP 0.5 model reduced FLOPs by 15% and model size by 7.6%, while improving LFW accuracy from 98.75 to 98.87% and CFP-FP from 88.41 to 88.56%.

Next, we enhanced the CSP-based architecture by incorporating ECA in the classification head. The resulting ECA-CSP GhostFaceNetV1-2 maintained the same parameter count and model size as the CSP-only version, due to the small size of the attention module, while further improving generalization on pose and age benchmarks. Notably, the ECA-CSP model reached 90.67% on AgeDB-30 and 88.87% on CFP-FP.

In the second part of our experiments, we evaluated larger models using the GhostFaceNet V2-2 configuration. The baseline GhostFaceNetV2-2 achieved strong accuracy but incurred significantly higher computational cost (13.68 MB, 76.5 MFLOPs). Introducing CSP reduced the FLOPs by 19% and size by 15%, with negligible or positive impact on performance. Again, ECA further improved results, especially on challenging benchmarks such as CFP-FP (from 89.29 to 90.10%).

Overall, these experiments confirm that our proposed ECA-CSP GhostFaceNet models achieve better or comparable accuracy with significantly reduced complexity, making them more suitable for real-time or resource-constrained deployment.

To further validate the generalization of our proposed models, we conducted additional experiments on the UMDFaces dataset. Table 4 summarizes the performance. The CSP GhostFaceNetV1-2 model slightly improved the baseline in LFW (98.82% vs. 98.77%) and CFP-FP (86.93% vs. 86.59%), while maintaining comparable performance across the remaining benchmarks. Introducing ECA attention led to further gains, notably in AgeDB-30 90.98% and CFP-FP 87.08%. For the V2-2 configuration, both CSP and ECA-CSP models consistently outperformed the baseline. The ECA-CSP GhostFaceNetV2-2 achieved the highest accuracy on most benchmarks, including LFW 99.08%, CA-LFW 92.77%, CFP-FP 87.60%, and CFP-FF 98.71%. These improvements highlight the effectiveness of combining CSP for structural efficiency and ECA for channel-wise attention in boosting model generalization, even when trained on a different dataset.

Table 4 Performance of models trained on the UMDFaces dataset

Full size table

We evaluate the proposed models on the IJB-B and IJB-C benchmarks using ROC curves, as illustrated in Fig. 4. Across both datasets, the ECA-CSP-GhostNetV2 consistently achieves the highest AUC scores, indicating superior discriminative capability. Architectures that integrate the ECA mechanism with the CSP strategy significantly outperform the GhostNetV2 baseline, demonstrating the benefits of this hybrid design. Moreover, models trained on the more diverse UMDFace dataset exhibit improved performance particularly at low false positive rates highlighting its contribution to better generalization under unconstrained conditions. These findings validate the robustness and effectiveness of the proposed architectural enhancements in advancing face recognition accuracy.

4.3.2 Qualitative analysis of bottleneck features via Grad-CAM

To qualitatively validate the performance improvements introduced by our CSP design and to illustrate its impact on feature diversity, we employed Grad-CAM visualizations to examine the model’s attention. As shown in Fig. 5, we compare the activation heatmaps of the CSP-enhanced bottlenecks with those of the original GhostFaceNet baselines. The results clearly demonstrate that our CSP models focus more sharply on discriminative facial regions, such as the eyes, nose, and mouth, while the original bottlenecks often produce more diffuse activations extending into non-informative background areas. These visualizations indicate that the CSP architecture promotes richer and more diverse feature representations, enabling the model to selectively leverage salient information, which complements the quantitative accuracy improvements reported in our experiments.

4.3.3 Comparison with state-of-the-art methods

Table 5 presents a comparative evaluation of our proposed models CSP GhostFaceNet and ECA-CSP GhostFaceNet against state-of-the-art lightweight face recognition architectures across multiple benchmark datasets. Despite being trained on relatively small-scale datasets, CASIA-WebFace and UMDFaces, containing 0.49M and 0.37M images, respectively, our models achieve competitive recognition performance while offering notable improvements in computational efficiency.

Table 5 Comparison with state-of-the-art lightweight face recognition models

Full size table

Notably, our models demonstrate a compelling balance between computational efficiency and recognition accuracy. When trained on the CASIA-WebFace and UMDFaces datasets, both CSP GhostFaceNet V2-2 and ECA-CSP GhostFaceNet V2-2 achieve LFW accuracies exceeding 99%, while maintaining significantly lower FLOPs ($\approx$ 62 FLOPs) and compact model sizes (11.66 MB). In addition, the V1-2 variants of CSP GhostFaceNet and ECA-CSP GhostFaceNet further underscore the efficiency of our design. With FLOPs as low as 51.64 and a model size of just 7.51 MB, these models achieve LFW accuracies of 98.87% and 98.82% on CASIA-WebFace and UMDFaces. These results surpass several existing models, such as Fast-FaceNet and MobileFaceNets, which either exhibit lower accuracy or demand higher computational resources. The consistent performance of our architectures across datasets highlights their robustness and suitability for deployment in resource-constrained environments.

4.3.4 End-to-end performance on a real-world edge device

To assess the practical deployment viability of our proposed models, we conducted a comprehensive end-to-end performance analysis on a real-world edge device. This evaluation considers the entire pipeline, from initial image preprocessing to the final model inference, providing a realistic measure of throughput for on-device applications. The end-to-end pipeline consists of two main stages. First, a preprocessing stage performs face detection and alignment. For this, we employed the highly optimized MediaPipe BlazeFace detector, a standard for efficient on-device applications. The preprocessing latency was benchmarked using a high-resolution HD (1280x720) input image to simulate a realistic camera feed. Second, the inference stage processes the aligned 112x112 face patch. The inference latency of our models was benchmarked directly on an Android ARM-based device using the official TensorFlow Lite benchmarking tool. All models were converted to float16 format and tested with a single CPU thread over 100 timed runs after 10 warm-up iterations. The complete performance results, including a breakdown of the latency for each stage, are summarized in Table 6. The MediaPipe preprocessing step adds a consistent overhead of approximately 3.54 ms. When combined with the model’s inference time, the total end-to-end latency for our ECA-CSP GhostFaceNetV1-2 model is 21.16 ms, corresponding to a real-time throughput of over 47 FPS. Similarly, the ECA-CSP GhostFaceNetV2-2 variant achieves a throughput of 36 FPS. This analysis demonstrates that even when accounting for the necessary preprocessing overhead of a real-world pipeline, our architectural enhancements provide a significant advantage. The final models not only achieve higher accuracy but also maintain a lower total latency and smaller model size compared to their respective baselines. These results confirm that our proposed GhostFaceNet++ framework is exceptionally well-suited for deployment in practical, real-time face recognition systems on resource-constrained edge devices.

Table 6 End-to-end latency, throughput, and model size evaluation on an edge device

Full size table

5 Conclusion

This paper introduced an efficient and compact face recognition architecture based on enhancements to the GhostFaceNet framework. By integrating CSP connections within the GhostNet bottlenecks and incorporating ECA into the classification head, the proposed models achieved a favorable balance between accuracy and computational efficiency. Experiments on CASIA-WebFace and UMDFaces datasets confirmed that the CSP and ECA-CSP variants outperformed the original GhostFaceNet baselines in terms of FLOPs, parameter count, and model size, while also improving recognition accuracy across several benchmarks. For example, the ECA-CSP GhostFaceNet V2-2 variant, when trained on the CASIA-WebFace dataset, reduces the FLOPs from 76.51M to 62M and the model size from 13.68 MB to 11.66 MB, while simultaneously improving CFP-FP accuracy from 89.29% to 90.10%. To further assess deployment viability, inference latency was evaluated on a real ARM-based Android device. Results demonstrated that the proposed models are well-suited for real-time face recognition in resource-constrained environments.

Future work will aim to scale training to larger and more diverse datasets, which was limited in this study due to computational constraints. In addition, we plan to investigate enhancements in spatial feature representation to improve accuracy, by integrating more advanced attention mechanisms and lightweight transformer modules, which may further improve robustness under challenging conditions such as pose variation, occlusion, or low resolution.

Data availability

No datasets were generated or analyzed during the current study.The source code is available at: \url{https://github.com/randanachet/ECA_CSP_GhostFaceNet}https://github.com/randanachet/ECA_CSP_GhostFaceNet

References

Alansari, M., Hay, O.A., Javed, S., Shoufan, A., Zweiri, Y., Werghi, N.: GhostFaceNets: lightweight face recognition model from cheap operations. IEEE Access 11, 35429–35446 (2023)
Article Google Scholar
Bansal, A., Nanduri, A., Castillo, C.D., Ranjan, R., Chellappa, R.: UMDFaces: an annotated face dataset for training deep networks. In: 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 464–473. IEEE (2017)
Boutros, F., Damer, N., Fang, M., Kirchbuchner, F., Kuijper, A.: MixFaceNets: extremely efficient face recognition networks. In: 2021 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–8. IEEE (2021)
Boutros, F., Siebke, P., Klemt, M., Damer, N., Kirchbuchner, F., Kuijper, A.: PocketNet: extreme lightweight face recognition network using neural architecture search and multistep knowledge distillation. IEEE Access 10, 46823–46833 (2022)
Article Google Scholar
Chen, S., Liu, Y., Gao, X., Han, Z.: MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile devices. In: Chinese conference on biometric recognition, pp. 428–438. Springer (2018)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)
Hoo, S.C., Ibrahim, H., Suandi, S.A.: ConvFaceNext: lightweight networks for face recognition. Mathematics 10(19), 3592 (2022)
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on faces in ’Real-Life’ Images: Detection, Alignment, and Recognition (2008)
Khalifa, A., Abdelrahman, A.A., Hempel, T., Al-Hamadi, A.: Towards efficient and robust face recognition through attention-integrated multi-level CNN. Multimedia Tools Appl 84, 12715–12737 (2024)
Article Google Scholar
Li, X., Wang, F., Hu, Q., Leng, C.: AirFace: lightweight and efficient model for face recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Liao, L., Wu, S., Song, C., Fu, J.: PH-CBAM: a parallel hybrid CBAM network with multi-feature extraction for facial expression recognition. Electronics 13(16), 3149 (2024)
Article Google Scholar
Liao, L., Wu, S., Song, C., Fu, J.: RS-Xception: a lightweight network for facial expression recognition. Electronics 13(16), 3217 (2024)
Article Google Scholar
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Martindez-Diaz, Y., Luevano, L.S., Mendez-Vazquez, H., Nicolas-Diaz, M., Chang, L., Gonzalez-Mendoza, M.: ShuffleFaceNet: a lightweight face architecture for efficient and highly-accurate face recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Maze, B., Adams, J., Duncan, J.A., Kalka, N., Miller, T., Otto, C., Jain, A.K., Niggel, W.T., Anderson, J., Cheney, J., et al.: IARPA Janus Benchmark-C: face dataset and protocol. In: 2018 International Conference on Biometrics (ICB), pp. 158–165. IEEE (2018)
Moschoglou, S., Papaioannou, A., Sagonas, C., Deng, J., Kotsia, I., Zafeiriou, S.: AgeDB: the first manually collected, in-the-wild age database. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 51–59 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Sengupta, S., Chen, J.C., Castillo, C., Patel, V.M., Chellappa, R., Jacobs, D.W.: Frontal to profile face verification in the wild. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp. 1–9. IEEE (2016)
Shen, J., Liu, N., Sun, H., Li, D., Zhang, Y.: An instrument indication acquisition algorithm based on lightweight deep convolutional neural network and hybrid attention fine-grained features. IEEE Trans. Instrum. Meas. 73, 1–16 (2024)
Google Scholar
Shen, J., Liu, N., Sun, H., Li, D., Zhang, Y., Han, L.: An algorithm based on lightweight semantic features for ancient mural element object detection. npj Herit. Sci. 13(1), 70 (2025)
Article Google Scholar
Shen, J., Liu, N., Xu, C., Sun, H., Xiao, Y., Li, D., Zhang, Y.: Finger vein recognition algorithm based on lightweight deep convolutional neural network. IEEE Trans. Instrum. Meas. 71, 1–13 (2021)
Google Scholar
Tang, Y., Han, K., Guo, J., Xu, C., Xu, C., Wang, Y.: GhostNetV2: enhance cheap operation with long-range attention. Adv. Neural. Inf. Process. Syst. 35, 9969–9982 (2022)
Google Scholar
Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020)
Whitelam, C., Taborsky, E., Blanton, A., Maze, B., Adams, J., Miller, T., Kalka, N., Jain, A.K., Duncan, J.A., Allen, K., et al.: IARPA Janus benchmark-B face dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 90–98 (2017)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Wu, X., He, R., Sun, Z., Tan, T.: A light CNN for deep face representation with noisy labels. IEEE Trans. Inf. Forensics Secur. 13(11), 2884–2896 (2018)
Article Google Scholar
Xiao, J., Jiang, G., Liu, H.: A lightweight face recognition model based on MobileFaceNet for limited computation environment. EAI Endorsed Trans. Internet Things 7(27), 1–9 (2022)
Google Scholar
Xu, X., Du, M., Guo, H., Chang, J., Zhao, X.: Lightweight FaceNet based on MobileNet. Int. J. Intell. Sci. 11(1), 1–16 (2020)
Google Scholar
Yan, M., Zhao, M., Xu, Z., Zhang, Q., Wang, G., Su, Z.: VarGFaceNet: an efficient variable group convolutional neural network for lightweight face recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv preprint arXiv:1411.7923 (2014)
Zheng, T., Deng, W.: Cross-pose LFW: a database for studying cross-pose face recognition in unconstrained environments. Tech. Rep. 18-01, Beijing University of Posts and Telecommunications (2018)
Zheng, T., Deng, W., Hu, J.: Cross-age LFW: a database for studying cross-age face recognition in unconstrained environments. arXiv preprint arXiv:1708.08197 (2017)

Download references

Acknowledgements

This work was partially supported by the European Commission (EPISTEAM, 101129655), the Spanish Agencia Estatal de Investigación (PID2023-149753OB-C22), and Fundación Séneca–Agencia de Ciencia y Tecnología de la Región de Murcia (22633/PI/24).

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Author information

Authors and Affiliations

Signals and Images Laboratory, University of Sciences and Technology of Oran Mohamed Boudiaf (USTOMB), BP 1505, El Mnaouar-Bir El Djir, Oran, Algeria
Randa Nachet & Tarik Boudghene Stambouli
Departamento de Electrónica, Tecnología de Computadoras y Proyectos, Universidad Politécnica de Cartagena member of European University of Technology EUT+, Plaza del Hospital, 1, Cartagena, 30202, Spain
Javier Garrigós

Authors

Randa Nachet
View author publications
Search author on:PubMed Google Scholar
Javier Garrigós
View author publications
Search author on:PubMed Google Scholar
Tarik Boudghene Stambouli
View author publications
Search author on:PubMed Google Scholar

Contributions

Randa Nachet was responsible for the conceptualization, methodology, software implementation, and original draft preparation. Javier Garrigós and Tarik Boudghene Stambouli contributed to the review and editing of the manuscript and provided supervision throughout the work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Javier Garrigós.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access funding provided by Universidad Politécnica de Cartagena.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nachet, R., Garrigós, J. & Stambouli, T.B. GhostFaceNet++: boosting efficiency and accuracy via CSP bottlenecks and Channel Attention. J Real-Time Image Proc 22, 201 (2025). https://doi.org/10.1007/s11554-025-01768-x

Download citation

Received: 05 August 2025
Accepted: 11 September 2025
Published: 13 October 2025
Version of record: 13 October 2025
DOI: https://doi.org/10.1007/s11554-025-01768-x

Keywords

Profiles

Javier Garrigós View author profile

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

GhostFaceNet++: boosting efficiency and accuracy via CSP bottlenecks and Channel Attention

Abstract

Similar content being viewed by others

FGENet: a lightweight facial expression recognition algorithm based on FasterNet

Towards efficient and robust face recognition through attention-integrated multi-level CNN

Attention-augmented squeeze-and-excitation enhanced mobile network for occluded facial expression recognition in resource-constrained environments

Explore related subjects

1 Introduction

2 Related works

2.1 Evolution of lightweight face recognition techniques

2.2 Attention mechanisms in lightweight models

3 Proposed approach

3.1 GhostFaceNet architectures

3.2 Cross Stage Partial strategy

3.3 Architecture improvements: CSP-Ghost bottleneck and ECA-enhanced head design

4 Experiments and results

4.1 Datasets

4.2 Implementation details

4.3 Results

4.3.1 Performance on CASIA-WebFace and UMDFaces datasets

4.3.2 Qualitative analysis of bottleneck features via Grad-CAM

4.3.3 Comparison with state-of-the-art methods

4.3.4 End-to-end performance on a real-world edge device

5 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles