E2E-AutoPT: An End-to-End Automated Penetration Testing with LSTM-PPO Approach

Ren, Qiankun; Xiong, Xinli; Liu, Jingju

doi:10.1007/978-981-96-2864-3_32

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15528))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

488 Accesses
1 Citation

Abstract

With the rapid growth of the Internet, malicious cyber activities have caused significant user losses, underscoring the need for enhanced cybersecurity. Automated Penetration Testing (AutoPT) has emerged as an effective method for identifying vulnerabilities. However, effective AutoPT requires utilizing contextual information across temporal dimensions and comprehensive text data. To address these challenges, we propose an advanced end-to-end (E2E) model for AutoPT that integrates scanning information embedding with a novel Long Short-Term Memory-Proximal Policy Optimization (LSTM-PPO) approach. The LSTM-PPO architecture enhances action selection by incorporating scanning information in an E2E manner, improving the use of spatial and historical data. Experimental results show that the proposed approach outperforms traditional PPO algorithms, achieving higher cumulative rewards in fewer steps. Specifically, the LSTM-PPO approach achieves 18.86%, 19.32%, 10.29%, and 29.47% higher convergence rewards than PPO algorithms in networks with 20, 30, 40 hosts, and large-scale networks, respectively. These findings highlight significant improvements in addressing decision-making challenges in AutoPT, potentially enhancing penetration testing efficiency and strengthening cybersecurity.

Download conference paper PDF

AI-Enabled Policy Content Modeling: A Systems Approach

PTGroup: An Automated Penetration Testing Framework Using LLMs and Multiple Prompt Chains

A Weighted LSTM Deep Learning for Intrusion Detection

1 Introduction

In the digital age, cybersecurity is increasingly critical. Traditional manual testing is costly and inefficient as cyberspace grows. Automated penetration testing (AutoPT) has emerged to enhance efficiency [1], but it still faces limitations, particularly in intelligence. Reinforcement learning offers solutions to these challenges [2]. Most AutoPT research assumes a fully observable environment, modeled as an MDP, which doesn’t reflect the reality of partial information. The Partially Observable Markov Decision Process (POMDP) model better suits real-world scenarios, as shown by Zhang et al. [3] and Schwartz et al. [4], who incorporate scanning actions and defender strategies. AI in cybersecurity is a promising field, with applications in penetration testing enhancing efficiency [5] [6]. Recent advancements include frameworks like GAIL-PT [7], which combine expert knowledge with AI, and the use of large language models [8]. These developments in AutoPT are set to revolutionize cybersecurity by improving vulnerability identification and providing more effective security solutions.

Based on the analysis of existing methods, this paper first proposes a text embedding method based on original scanning information, addressing the challenge of incomplete information during the penetration testing phase. Next, an integrated decision-making framework, from perception to decision-making, is introduced. Finally, the effectiveness of the proposed method is verified through the LSTM-PPO algorithm. The structure of this paper is as follows: Sect. 2 reviews related work and gives a brief discussion. Section 3 describes our proposed model. Section 4 details the proposed raw scanning text embedding approach and the improved LSTM-PPO approach. Section 5 presents experimental results and analysis. Section 6 concludes the paper and discusses future works.

2 Related Works

This section reviews previous studies that have utilized the E2E approach in relevant research fields and discusses several approaches to improvements in AutoPT.

The E2E research approach has achieved significant breakthroughs across various fields. Currently, the E2E approaches are widely applied to numerous computer vision challenges, including object identification, scene detection, image segmentation, and so on. In natural language processing, the E2E approach directly processes raw text inputs to perform tasks such as machine translation without manual feature extraction [9] [10]. In computer vision, the E2E approach produce outputs like image classification and object detection directly from original images or videos, eliminating the need for manual feature design [11]. For speech recognition, the E2E approaches convert speech signals directly into text, bypassing manual feature extraction [12]. Autonomous driving also benefits from the E2E approaches, which link sensor inputs directly to vehicle control outputs, eliminating the need for manual feature extraction [13] [14]. In penetration testing, the E2E reinforcement learning models streamline the entire process from scanning information to action execution and enhance the efficiency of security assessments [15] [16]. Thus, there is significant potential for the development of the E2E automated penetration testing.

Challenges persist in the practical application of penetration testing based on reinforcement learning. For example, agents encounter challenges with convergence and decision efficiency because of the high-dimensional nature of discrete action spaces. To address these challenges, Yang et al.. [17] modeled the penetration testing problem as Markov Decision Processes (MDP) and introduced a coverage-based masking mechanism using the PPO algorithms to help agents adapt to future exploration, reducing the focus on previously selected actions. Guo et al.. [18] modeled Advanced Persistent Threats (APT) as POMDPs, proposing the PLAPT framework based on the PPO algorithm, which successfully reduced the dimensionality of large action spaces for agents. Despite the progress made in improving the efficiency of penetration testing, these approaches have yet to fully represent the historical states of agents or consider the realism of training environments, which remains a direction for future research.

3 Reinforcement Learning Model

3.1 POMDP of AutoPT

Despite reinforcement learning assisting agents in decision-making by optimizing reward functions, adjusting environmental states, and defining action spaces, in real-world penetration testing scenarios, agents often face constraints due to partial observability. Their access is often confined to limited and possibly unreliable data sources, such as network traffic and system logs from specific nodes, which limits their ability to fully comprehend the target system. POMDP as a widely adopted solution, offers a new approach to decision-making problems for agents unable to fully observe the environment state. Therefore, to tackle the obstacles presented by environments with incomplete information, we model the AutoPT problem as a POMDP and employ high-performance reinforcement learning frameworks to handle penetration testing tasks in partially observable environments. This strategy seeks to surmount the obstacles encountered by conventional approaches in penetration testing. POMDP is typically composed of a septuple $\langle S, A, T, O, R, \gamma \rangle $, where S represents the state of the agent, A represents the action, T represents the transition probability function, O represents the observation space, R represents the reward and $\gamma $ represents the discount factor. The decision process of the agent based on POMDP is shown in Fig. 1. The agent begins in an initial state $s_0$, containing essential information about the target network. At each time step t, it selects an action $a_t$ based on the observation $o_t$ using a policy function $\pi (o_t, \theta )$. The environment responds with a reward $r_t$ and updates the state to $s_{t+1}$.

This process continues until the agent either exhausts its steps or achieves its goal, completing the penetration test. Through this iterative interaction, the agent improves its ability to identify vulnerabilities, enhancing the efficiency of penetration testing.

3.2 Scan Information Embedding

Based on the POMDP model we constructed, $ O = \left\{ o _ { 1 }, o _ { 2 }, \ldots , o _ { n } \right\} $ represents the observation space, which denotes the set of system states observable by the penetration tester for the target hosts. The information regarding the target hosts can be represented as $H= \left\{ h _ { 1 }, h _ { 2 }, \ldots , h _ { n } \right\} $.

$${h_1} = \left\{ \begin{array}{l} {\textbf {IP}}: 192.168.1.32\\ {\textbf {Port}}: 22,8000\\ {\textbf {Services:}} ssh OpenSSH 9.1p1 Debian 1, http Werkzeug httpd 1.0.1\\ {\textbf {OS:}} Linux\\ {\textbf {Vulnerability:}} CVE-2017-8291\\ {\textbf {Web fingerprint:}} HTML5 HTTPServer[Werkzeug/1.0.1 Python/3.5.3] \end{array} \right. $$

To efficiently encode scanning data into the intelligent penetration testing model, we utilized a denoising autoencoder built upon the Transformer architecture, referred to as TSDAE [19], for training action embedding vectors. This approach involves converting scanning information into vector representations and integrating them into the model’s input, as shown in Fig. 2. In this context, “[CLS]” denotes the commencement of a textual sequence, while “[EOS]” indicates its conclusion. Following encoding via the TSDAE model, a vector representation of the scanned textual information can be obtained. The coding process is as follows: The coding process includes scanning data, tokenization, generating word embeddings, and passing through the encoder. By following this coding process, we convert the raw scan data into a structured vector representation for later agent decision-making processes.

3.3 Historical Observation Information Fusion

The fusion of historical information maximizes the utilization of previously observed data to enhance the understanding and decision-making capabilities of intelligent penetration testing models regarding the environment. Integrating historical information provides contextual cues, enhancing the agent’s comprehension of past state information. This allows the agent to better consider previous observations, leading to more rational inference and planning during the decision-making process. We denote the sequence of historical observation information as ${o_1, o_2, \ldots , o_t}$, as illustrated in Fig. 3. where $o_t$ represents the environmental state observed by the agent at time step t. In the intelligent penetration testing model, we introduce an additional dimension to represent historical observation information as a vector $h_o \in \mathbb {R}^d$. Then, we concatenate it with the representation vector $s_t$ of the current environmental state to form the input of the model. This approach integrates historical observational data into the model, enhancing its understanding of past states. To achieve this fusion, we introduce an enhanced approach called LSTM-PPO, which will be detailed in the following sections.

4 Our Approach

To address the limitations of current penetration testing decision-making methods, we propose the LSTM-PPO approach, which leverages raw scan data for text embedding. Unlike traditional rule-based methods, this approach employs LSTM networks to capture temporal dependencies in historical data and utilizes PPO algorithms to optimize the agent’s policies, enhancing decision-making in the testing environment. As illustrated in Fig. 4, we propose an algorithmic enhancement framework based on historical information fusion, comprising four main modules: the information scanning module, text embedding module, policy module based on LSTM networks (Actor), and evaluation module (Critic).

Initially, the agent scans vulnerability data from the Vulhub [20] digital environment and encodes the scan details using the TSDAE model. These encoded vectors are input into the reinforcement learning algorithm to inform action selection. The environment then returns the reward to the agent. In our framework, both the Actor policy network and Critic value network incorporate LSTM networks to capture historical state information. The LSTM processes the agent’s state at each time step, producing encoded memory and prediction data, which are then used by the Actor for action decisions and the Critic for evaluating and refining the policy based on the received rewards.

4.1 TSDAE for Embedding

In our constructed penetration testing environment, after the agent retrieves scanning information from the digital environment, it first encodes the text information. To obtain an embedded representation of the agent’s action space, we utilize a denoising autoencoder based on the Transformer structure to train action embedding vectors. We utilize TSDAE to transform vulnerability description text into fixed-size vectors. The training process of TSDAE is outlined in Algorithm 1.

We begin by tokenizing the text and then utilize a model pre-trained on TSDAE to obtain its encoded representation. As shown in Fig. 5, we obtain vector representations of the text after encoding with TSDAE, where each element represents the value of the corresponding feature. During the pre-training phase of the TSDAE model, these features are autonomously acquired, with the goal of enhancing the capture of semantic information within the text. These vector representations empower the model to grasp the textual meaning with increased precision.

4.2 LSTM-PPO Approach

The PPO algorithms is a reinforcement learning technique that uses “proximal policy optimization” to update policies by constraining the step size of updates [21]. Incorporating LSTM networks into the PPO algorithms leverages their ability to handle sequential data and mitigate long-term dependency issues. The inputs to the Actor and Critic modules based on LSTM networks include the agent’s environmental state at each time step, while the outputs consist of encoded memory and prediction information along with the agent’s current action. The operation of the prediction module can be represented as:

$$\begin{aligned} \tilde{ o } _ { t } = L S T M ( s _ { a t - 1 } , h _ { t - 1 } , c _ { t - 1 }), \end{aligned}$$

(1)

where, $h_{t-1}$ and $c_{t-1}$ represent the hidden state output of the LSTM network after k decisions, while $s_{at-1}$ represents a vector composed of the agent’s state $s_{t-1}$ and action $a_{t-1}$ after k decisions. The output memory and prediction information $\tilde{o}_{t}$ are concatenated with the agent’s state $s_{t}$ and inputted into the Policy and Critic networks, thereby significantly enriching the input information for decision-making. In terms of network structure, $\tilde{o}_{t}$ serves as an intermediate variable, connecting the LSTM output layer with the input layers of the value and policy networks, making the LSTM a shared prefix network. By using the encoded output of LSTM’s historical state information as input for the next decision, the agent fully considers historical information, achieving optimal global optimization.

LSTM-PPO effectively addresses the issue of insufficient observation in partially observable environments, offering a new approach to reinforcement learning challenges for optimization. The training procedure of LSTM-PPO is shown in Algorithm 2.

The relationship between Actor, Critic, environment, and reward is shown in Fig. 6, where we have added LSTM to the Actor schema to better perceive historical state information. In the Actor network, we stack LSTM layers on top of each other, and append a fully connected layer to the final LSTM layer to generate actions. Each LSTM layer feeds its output as the input to the next layer, all the way up to the final fully connected layer.

The output of the LSTM, denoted as output, is used for further computation of action probabilities and value estimation.

$$\begin{aligned} output=fc(h_{T}), \end{aligned}$$

(2)

where $h_{T}$ signifies the hidden state at the final time step, and $f_{c}$ indicates the fully connected layer. To transform the output of the fully connected layer into a probability distribution for actions, we utilize the softmax function:

$$\begin{aligned} P(A_t=a|S_t;\theta ) = \frac{e^{Q(S_t,a;\theta )}}{\sum _{\alpha '} e^{Q(S_t,a';\theta )}}, \end{aligned}$$

(3)

where, $Q(S_t, a';\theta )$represents the estimated value for the state-action pair $(S_{t},a)$ and $\theta $ denotes the neural network parameters. We use mean square error (MSE) losses to update the critic network parameters:

$$\begin{aligned} L^{Critic}(\theta _{v})=\frac{1}{N}\sum _{i}(V(S_{i};\theta _{v})-V^{target}(S_{i}))^2, \end{aligned}$$

(4)

where, $V^{target}(S_{i})$represents the target value. The agent processes observation information through an LSTM network architecture and calculates reward values and success rates during interactions with the environment. The cumulative reward value is denoted as eval_rewards, and the success rate is denoted as eval_success_rate.

$$\begin{aligned} e v a l\_r e w a r d s = \sum _ { i = 1 } ^ { N } \sum _ { j = 1 } ^ { M _ { i } } r _ { i j }, \end{aligned}$$

(5)

where, $N$ represents the number of targets in the target list, i.e., $\text {len(target}\_\text {list)}$. $M_i$ denotes the total number of steps for the $i$-th target.

$$\begin{aligned} e v a l\_ s u c c e s s\_ r a t e = \frac{ l e n ( s u c e s s\_ l i s t ) }{ l e n ( t a r g e t \_ l i s t ) }, \end{aligned}$$

(6)

where, $\text {len(sucess}\_\text {list)}$ indicates the number of completed targets, while $\text {len(target}\_\text {list)}$ represents the total number of targets in the target list. In the LSTM-PPO approach, we leverage scanning information from a Vulhub-based digital environment. By integrating PPO algorithms with an LSTM network, this method enhances the agent’s environmental perception and decision-making capabilities, resulting in more precise penetration testing outcomes.

5 Experiments

5.1 Dataset

Since CyberBattleSim [22] and NASim [23] are tools for simulating network attacks and defenses, they provide rich simulated network scenarios and attack models. However, CyberBattleSim and NASim mainly focus on attack and defense behaviors in specific scenarios, lacking comprehensive coverage of real system vulnerabilities and being unable to offer a realistic and diverse vulnerability environment. In contrast, Vulhub significantly reduces the complexity of environment setup through pre-configured vulnerability images and simple deployment commands. More importantly, Vulhub is built based on real vulnerability environments, which can more accurately reflect actual attack scenarios, therefore, we choose Vulhub as the experimental scenario for this paper. After setting up the experimental environment, we used the Nmap scanning tool to perform comprehensive port and service scans on each virtual machine, collecting their IP addresses, open ports, running services, operating system information, and vulnerability information. We convert it into a standardized JSON format, where each JSON object contains information about one virtual machine. For example, a virtual machine with the CVE-2018-10933 vulnerability is described as ‘“ip”: “192.168.1.25”, “port”: [“22”, “2222”], “services”: [“ssh OpenSSH 9.1p1”], “os”: “Linux”, “vulnerability”: [“CVE-2018-10933”].

5.2 Experimental Results

Since the number of layers and Dropout in the LSTM network represents the number of network layers and the Dropout rate, respectively, these two parameters play a crucial role in the overall network architecture. To more accurately enhance the agent’s ability to perceive historical information, we conducted ablation experiments on the nums of layers and Dropout parameters in the LSTM, and we compared the relationship between cumulative reward value and success rate and step. As shown in Fig. 7, we set the range of nums of layers from 1 to 5. As shown in Fig. 8, we set the range of Dropout from 0 to 0.5.

To evaluate the effectiveness of our approach, we conducted experiments in a chained network simulation environment with 20, 30, and 40 target hosts, and in a large-scale network simulation environment.

Through meticulous logging of the cumulative reward values after each training iteration, we created graphical illustrations showing the correlation between the training progress and the cumulative reward values, and between the training progress and the success rate. As shown in Fig. 9, we observe a steady increase in the cumulative reward values as training progresses, reflecting the model’s continuous learning and adaptation to the complexity of the task environment. Compared to the PPO, DQN, and Random algorithm, our model demonstrates a certain advantage from the initial stages of training, and this advantage is further consolidated and expanded throughout subsequent training iterations.

Based on the experimental results shown in Fig. 9, it reveals that when the number of hosts is 20, the LSTM-PPO approach’s convergence reward increases by 18.86% compared to the PPO algorithms and by 392.06% compared to the DQN algorithm. When the number of hosts is 30, the LSTM-PPO approach’s convergence reward increases by 19.32% compared to the PPO algorithms and by 620.98% compared to the DQN algorithm. When the number of hosts is 40, the LSTM-PPO approach’s convergence reward increases by 10.29% compared to the PPO algorithms and by 1762.10% compared to the DQN algorithm. In large-scale network scenarios, the LSTM-PPO approach’s convergence reward increases by 29.47% compared to the PPO algorithms and by 383.43% compared to the DQN algorithm. Therefore, when the number of hosts exceeds 20, the LSTM-PPO approach outperforms other algorithms in terms of both cumulative reward and maximum success rate in chain networks and complex networks. This is because, as the network scale reaches a certain level, the LSTM network architecture can better utilize historical states to assist agents in making decisions. Consequently, the PPO-LSTM approach can achieve faster convergence and higher utility in practical network applications.

6 Conclusion

This paper introduces an E2E approach of AutoPT that emphasizes raw scanning information embedding and historical observations fusing. We can include information without omitting relevant details by implementing the TSDAE algorithm to represent raw scanning data. An innovative reinforcement learning approach combining PPO with LSTM is presented to guide the decision-making process by dynamically capturing historical observations and updating internal states. The adoption of the LSTM-PPO emerges as an effective AutoPT approach to adapt to dynamic changes in the environment and make more accurate and robust decisions. Experimental validations demonstrate the remarkable superiority of the LSTM-PPO approach across different network scales, providing more effective and efficient solutions for AutoPT agents.

In the future, we intend to delve further into comprehensive PT information embedding approaches involving topologies of the target networks. Furthermore, we are interested in hierarchical organizations of deep reinforcement learning algorithms to facilitate the fine-grained decomposing of actions in processes of AutoPT. Lastly, our objectives are aiming at generating a fully AutoPT agent that can analyze diverse systems protected by advanced protection approaches, such as dynamic defense mechanisms.

References

Li, Q., Hu, M., Hao, H., Zhang, M., Li, Y.: INNES: an intelligent network penetration testing model based on deep reinforcement learning. Appl. Intell. 53(22), 27110–27127 (2023). https://doi.org/10.1007/S10489-023-04946-1
Article MATH Google Scholar
Yi, J., Liu, X.: Deep reinforcement learning for intelligent penetration testing path design. Appl. Sci. 13(16), 9467 (2023)
Article MATH Google Scholar
Zhang, Y., Liu, J., Zhou, S., Hou, D., Zhong, X., Lu, C.: Improved deep recurrent q-network of pomdps for automated penetration testing. Appl. Sci. 12(20), 10339 (2022)
Article MATH Google Scholar
Schwartz, J., Kurniawati, H., El-Mahassni, E.: POMDP + information-decay: incorporating defender’s behaviour in autonomous penetration testing. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, pp. 235–243 (2020)
Google Scholar
Garrad, P., Unnikrishnan, S.: Artificial intelligence in penetration testing of a connected and autonomous vehicle network. Int. J. Mech. Mechatron. Eng. 16(12), 341–346 (2022)
MATH Google Scholar
Garrad, P., Unnikrishnan, S.: Reinforcement learning in vanet penetration testing. Results Eng. 17, 100970 (2023)
Article MATH Google Scholar
Chen, J., Hu, S., Zheng, H., Xing, C., Zhang, G.: Gail-pt: an intelligent penetration testing framework with generative adversarial imitation learning. Comput. Secur. 126, 103055 (2023)
Article Google Scholar
Deng, G., et al.: Pentestgpt: an llm-empowered automatic penetration testing tool. arXiv preprint arXiv:2308.06782 (2023)
Xu, H.A., Maccari, B., Guillain, H., Herzen, J., Agri, F., Raisaro, J.L.: An end-to-end natural language processing application for prediction of medical case coding complexity: algorithm development and validation. JMIR Med. Inform. 11(1), e38150 (2023)
Article Google Scholar
Bitterman, D.S., et al.: An end-to-end natural language processing system for automatically extracting radiation therapy events from clinical texts. Int. J. Radiat. Oncology* Biology* Physics 117(1), 262–273 (2023)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Prabhavalkar, R., Hori, T., Sainath, T.N., Schlüter, R., Watanabe, S.: End-to-end speech recognition: a survey. IEEE/ACM Trans. Audio Speech Lang. Process. 32, 325–351 (2024)
Article Google Scholar
Chib, P.S., Singh, P.: Recent advancements in end-to-end autonomous driving using deep learning: a survey. IEEE Trans. Intell. Veh. 9(1), 103–118 (2024)
Article MATH Google Scholar
Chen, L., Wu, P., Chitta, K., Jaeger, B., Geiger, A., Li, H.: End-to-end autonomous driving: Challenges and frontiers. arXiv preprint arXiv:2306.16927 (2023)
Allakany, A., Yadav, G., Kumar, V., Paul, K., Okamura, K., Center, C.: An automated end-to-end penetration testing for iot. In: The National Conference of the Information Processing Society, vol. 5, p. 05 (2019)
Google Scholar
Yadav, G., Allakany, A., Kumar, V., Paul, K., Okamura, K.: Penetration testing framework for iot. In: 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 477–482. IEEE (2019)
Google Scholar
Yang, Y., Liu, X.: Behaviour-diverse automatic penetration testing: a curiosity-driven multi-objective deep reinforcement learning approach. arXiv preprint arXiv:2202.10630 (2022)
Guo, X., et al.: Automated penetration testing with fine-grained control through deep reinforcement learning. J. Commun. Inf. Netw. 8(3), 212–220 (2023)
Article MATH Google Scholar
Wang, K., Reimers, N., Gurevych, I.: TSDAE: using transformer-based sequential denoising auto-encoder for unsupervised sentence embedding learning. CoRR arxiv:2104.06979 (2021)
Team, V.: Vulhub (2023). https://github.com/vulhub/vulhub
Udupa, A., Anavi, B., Goyal, B., Kasturi, S.P., Agarwal, P.: Advanced reinforcement learning based penetration testing. In: 2024 International Conference on Electronics, Computing, Communication and Control Technology (ICECCC), pp. 1–6. IEEE (2024)
Google Scholar
Team, M.D.: CyberBattleSim (2021). https://github.com/microsoft/cyberbattlesim
Schwartz, J., Kurniawati, H.: Autonomous penetration testing using reinforcement learning. CoRR arxiv:1905.05965 (2019)

Download references

Author information

Authors and Affiliations

College of Electronic Engineering, National University of Defense Technology, Hefei, 230037, China
Qiankun Ren, Xinli Xiong & Jingju Liu
Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei, 230037, China
Qiankun Ren, Xinli Xiong & Jingju Liu

Authors

Qiankun Ren
View author publications
Search author on:PubMed Google Scholar
Xinli Xiong
View author publications
Search author on:PubMed Google Scholar
Jingju Liu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jingju Liu .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Xu Chen
University of Exeter, Exeter, UK
Geyong Min
National University of Defense Technology, Changsha, China
Deke Guo
Hainan University, Haikou, China
Xia Xie
Nankai University, Tianjin, China
Lingjun Pu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, Q., Xiong, X., Liu, J. (2025). E2E-AutoPT: An End-to-End Automated Penetration Testing with LSTM-PPO Approach. In: Chen, X., Min, G., Guo, D., Xie, X., Pu, L. (eds) Network and Parallel Computing. NPC 2024. Lecture Notes in Computer Science, vol 15528. Springer, Singapore. https://doi.org/10.1007/978-981-96-2864-3_32

Download citation

DOI: https://doi.org/10.1007/978-981-96-2864-3_32
Published: 29 March 2025
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-2863-6
Online ISBN: 978-981-96-2864-3
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

E2E-AutoPT: An End-to-End Automated Penetration Testing with LSTM-PPO Approach