AI-Pentesting

Posted Jan 15, 2025 Updated Jan 15, 2025

By 0xfl0k1

14 min read

Applicability of Artificial Intelligence in Penetration Testing: A Systematic Mapping

Authors

Lucas Ferreira Chagas Júnior
Graduate Program in Computer Science (PPGCC)
Federal University of Technology - Paraná (UTFPR)
Ponta Grossa, Brazil
lucasjunior@alunos.utfpr.edu.br

Lourival Aparecido de Góis
Department of Computing
Federal University of Technology - Paraná (UTFPR)
Ponta Grossa, Brazil
gois@utfpr.edu.br

Abstract

This paper presents a systematic mapping addressing the applicability of Artificial Intelligence (AI) in penetration testing. The objective is to identify the phases and tasks of pentesting where AI can be useful, following the methodology of the Penetration Testing Execution Standard (PTES). Additionally, the study highlights how researchers validated the efficiency and effectiveness of AI in penetration testing. The review included studies published in the past five years from scientific repositories. The results demonstrate that AI can be successfully applied in intrusion tests; however, there is room for improvement and optimization of AI techniques in this field.

Keywords

Artificial Intelligence
Penetration Testing
Cybersecurity

Introduction

The increasing digitalization of data transmission processes on the internet has made information security a top priority in the digital environment. With the constant advancement of technology, it is crucial to maintain protection and ensure the integrity of data. In this context, penetration testing plays a crucial role in proactively mitigating vulnerabilities that could be exploited by cybercriminals (Weidman, 2014).

Security is never absolute. An important lesson is that a determined attacker will always have an advantage, and, with few exceptions, the larger the company, the more vulnerable it becomes. This is due to having more systems to monitor, more points of entry and exit, and blurred boundaries between business units, resulting in a higher number of users. However, this does not mean that hope should be lost; the concept of “security through compliance” alone is insufficient to ensure protection (Allsopp, 2017).

To address security issues, Weidman (2014) defines penetration tests, or pentesting, as simulations of real attacks to assess the risks associated with potential security breaches. During a penetration test, pentesters not only identify vulnerabilities that attackers could exploit but also explore these vulnerabilities whenever possible to assess what attackers could achieve after successfully exploiting the flaws.

The Penetration Testing Execution Standard (PTES) is a methodology recognized in the information security community, used for conducting penetration tests. It provides a set of guidelines covering all stages of intrusion testing, from initial planning to post-exploitation and reporting.

Since these tests require broad technical knowledge in various IT areas, it is worth highlighting that these processes have been enhanced by Artificial Intelligence techniques, as shown in studies by Confido et al. (2022) and Hilário et al. (2024).

The objective of this paper is to map how AI tools can be applied to pentesting phases based on the PTES methodology and to demonstrate how researchers validated their efficiency and effectiveness. This mapping includes studies from the past five years found in repositories such as IEEE Xplore, ACM Digital Library, Scopus, and ScienceDirect, following the methodology proposed by Kitchenham and Charters (2007).

This paper is structured as follows: Section II presents the organizational structure of penetration tests. Section III details a literature mapping of AI approaches in penetration testing, discussing key advancements. Section IV describes the methodology used for systematic mapping, including inclusion and exclusion criteria and search procedures in the selected repositories. Section V discusses the results obtained, answering the research questions of this paper. Finally, Section VI presents the study’s conclusions, emphasizing the effectiveness of AI techniques in improving penetration tests and suggesting future research directions in this area.

Penetration Testing

Penetration testing is a legal and authorized attempt to locate and successfully exploit vulnerabilities in computer systems to improve their security. This process includes identifying vulnerabilities and performing proof-of-concept attacks to demonstrate their validity. Proper penetration testing concludes with specific recommendations to address and resolve identified issues. The main goal is to uncover security problems using the same tools and techniques as an attacker. These findings can be mitigated before a real hacker exploits them (Engebretson, 2013).

As defined by Engebretson (2013), penetration testing is also known as Pentesting, PT, Hacking, Ethical hacking, White hat hacking, Offensive security, and Red teaming. Additionally, Weidman (2014) categorizes penetration testing into three primary types:

Black-box: The professional has no prior knowledge of the target system.
White-box: The professional has complete access to the system’s internal structure.
Gray-box: The professional has limited access, knowing only some parts of the system.

Conducting a penetration test involves several tasks, typically following the PTES (Penetration Testing Execution Standard) methodology. Below is a brief summary of PTES phases:

Pre-engagement Interactions: Establish expectations, including objectives, scope, and rules of engagement between the client and testing team.
Information Gathering: Collect initial information about the target using various techniques.
Threat Modeling: Identify and categorize potential threats based on gathered information to guide the test.
Vulnerability Analysis: Evaluate collected information to identify specific vulnerabilities in target systems and networks.
Exploitation: Attempt to exploit identified vulnerabilities to assess their severity and potential impact.
Post-Exploitation: Activities to consolidate access, assess the impact of exploitation, and identify methods for maintaining persistent access.
Reporting: Document the findings, including vulnerabilities exploited, methods used, and recommendations for mitigation and security improvement.

With the structure of penetration testing and its PTES methodology outlined, the next section reviews the applicability of AI in penetration testing.

Artificial Intelligence in Penetration Testing

Artificial Intelligence (AI) has emerged as a significant tool in enhancing penetration testing, particularly through techniques such as Machine Learning (ML) and Reinforcement Learning (RL). These techniques are integrated into penetration testing phases, automating and optimizing vulnerability detection and exploitation.

Advancements in AI for penetration testing have shown promising results across various areas:

Confido et al. (2022): Highlighted the integration of ML techniques, especially RL, in pentesting frameworks. RL automates the identification of optimal attack paths and prioritizes vulnerabilities based on criticality.
Hilário et al. (2024): Focused on Generative AI (GenAI) for creating new data and patterns. The study demonstrated how tools like the Chat GPT API (Shell GPT) enhance penetration testing tasks.

AI’s impact extends to improving efficiency and reducing the need for human intervention.

Kasim et al. (2020): Demonstrated that adversarial AI competitors automate both attack and defense aspects of penetration testing, significantly minimizing human involvement.
McKinnel et al. (2019): Showed that combining RL with traditional approaches improves test efficiency, saving time and resources by automating repetitive and complex actions.

Continuous adaptation to technological advancements is essential to maintain the effectiveness of penetration testing.

Saber et al. (2023): Observed that while AI has proven significantly useful in penetration testing, keeping pace with technological evolution is necessary to enhance these tools further.

Research Methodology

To investigate the application of Artificial Intelligence in Penetration Testing, the Systematic Mapping Methodology proposed by Kitchenham and Charters (Kitchenham, 2007) was adopted. This study focuses on the past five years to include the most recent and relevant articles.

Research Questions and Search Repositories

The study aims to address two key research questions:

Q1: How are the phases of penetration testing assisted by AI?
Q2: What is the impact of AI application in penetration testing in terms of efficiency and effectiveness?

To answer these questions, the primary repositories selected for cybersecurity and artificial intelligence research were:

IEEE Xplore
ACM Digital Library
Scopus
ScienceDirect

Search Strategy and Inclusion/Exclusion Criteria

Specific keywords were used to construct effective search strings in the mentioned repositories. The keywords included:

“Artificial Intelligence”
“AI”
“Penetration Testing”
“Invasion Tests”
“Cyber Security”

The search string structure was as follows:
("Artificial Intelligence" OR "AI") AND ("Penetration Testing" OR "Invasion Tests") AND "Cyber Security"

This string was applied across all repositories for articles published between 2019 and 2024. The preliminary results are summarized in Table 1.

Database	Number of Articles
IEEE Xplore	30
ACM Digital Library	79
Scopus	22
ScienceDirect	173
Total	304

These results are preliminary, as the articles were later evaluated using specific inclusion and exclusion criteria.

Inclusion and Exclusion Criteria

To ensure relevance, the following criteria were applied:

Inclusion	Exclusion
Articles combining “Penetration Testing” and “Artificial Intelligence” in the title.	Articles not directly related to AI in penetration testing.
Articles aligned with the primary focus: AI in penetration testing.	Duplicate articles.
Articles freely available or accessible through university subscriptions.

Study Selection

After applying the inclusion and exclusion criteria, six articles were found relevant, representing approximately 1.98% of the total collected during the initial search. During the review of primary studies, one article was excluded due to its lack of alignment with the research questions, leaving five articles for detailed analysis. The selection process is illustrated in Figure 1.

Comparison with Similar Works

This systematic mapping aims to complement existing reviews and mappings, such as those by McKinnel et al. (2019) and Saber et al. (2023). These studies provide a broad perspective on cybersecurity, encompassing various methodologies and contexts. In contrast, this mapping seeks to deepen the understanding of AI integration into penetration testing processes.

The contributions of previous works inspire further exploration of AI’s potential to enhance penetration testing workflows. This research provides a complementary perspective, enriching ongoing discussions about emerging technologies in cybersecurity.

Results

After selecting the studies, five articles were analyzed, as highlighted in Table 1.

Author(s)	Title	Year
McKinnel et al.	A systematic literature review on AI in pentesting	2019
Kasim et al.	Cybersecurity as a Tic-Tac-Toe Game	2020
Confido et al.	Reinforcing Penetration Testing Using AI	2022
Saber et al.	Automated Penetration Testing: A Systematic Review	2023
Hilário et al.	Generative AI for Pentesting	2024

Q1. How are penetration testing phases enhanced with AI?

To answer this question, the five selected studies were analyzed; however, only two were directly relevant: Confido et al. (2022) and Hilário et al. (2024).

The study by Confido et al. (2022) focuses on applying Reinforcement Learning (RL) techniques to improve the efficiency and effectiveness of automated penetration testing using the PenBox framework developed by the European Space Agency (ESA). The main goal is to automate and optimize pentesting processes, enabling an RL agent to autonomously learn and execute cyberattacks, reducing the need for human intervention.

Using RL, the script keyboard agent.py imitates a real pentester’s actions, choosing the next step manually. The PenBox automates:

Information Gathering: Utilizing scanning tools to collect detailed information about the target, with RL aiding in decision-making.
Vulnerability Analysis: Automated assessment with tools like Nmap and Wireshark for discovery, while RL prioritizes and validates vulnerabilities based on criticality.
Exploitation: Optimizing action sequences to exploit vulnerabilities effectively, using tools like Metasploit Framework.
Post-Exploitation: Evaluating system compromise levels and maintaining access using techniques like data exfiltration and persistence methods.

Confido et al. (2022) concluded that optimal results are achieved by extensive training and minimizing exploratory steps, balancing the exploration of new actions with refining proven ones.

Hilário et al. (2024) explored Generative AI (GenAI), which focuses on creating new data and patterns using techniques like deep learning and natural language processing (NLP). Popular tools include GPT by OpenAI, now in its fourth iteration.

In this study, Shell GPT, a Python-based CLI tool integrating Chat GPT API, was used for tasks like:

Information Gathering: Combining Shell GPT with tools like Nmap to identify IP addresses and active hosts.
Vulnerability Analysis: Assisting with tasks like FTP anonymous login, analyzing application source code, and reading critical files.
Exploitation: Automating scans of WordPress services and directory enumeration using Gobuster to identify vulnerabilities and user accounts.
Post-Exploitation: Facilitating tasks like decoding messages, configuring OpenSSH private keys, and executing root privilege escalations.
Reporting: Generating precise and tailored reports with LLMs, identifying errors and inconsistencies.

These studies show that penetration testing phases can be technically supported by diverse AI approaches. Table 2 summarizes AI applications in PTES phases based on Confido et al. (2022) and Hilário et al. (2024).

Pentesting Phase	Confido et al. (2022)	Hilário et al. (2024)
Pre-engagement Interactions	Not Relevant	Not Relevant
Information Gathering	RL with PenBox	Shell GPT and pentesting tools
Threat Modeling	Not Relevant	Not Relevant
Vulnerability Analysis	Automation with RL and tools	Shell GPT and pentesting tools
Exploitation	RL for optimized action sequences	Shell GPT and pentesting tools
Post-Exploitation	RL for maintaining access and looting	Shell GPT and pentesting tools
Reporting	Not performed	LLMs assisted report generation

Q2. What is the impact of AI in penetration testing regarding efficiency and effectiveness?

This question was addressed using insights from all five studies: Saber et al. (2023), Kasim et al. (2020), Confido et al. (2022), McKinnel et al. (2019), and Hilário et al. (2024).

Kasim et al. (2020) demonstrated the effectiveness of well-trained adversarial AI systems in automating attack and defense processes in controlled environments.
Confido et al. (2022) validated the reduction in human interaction and optimization of actions through RL-based approaches.
McKinnel et al. (2019) emphasized the necessity of evolving AI to adapt to adversarial techniques, ensuring detection and prevention.
Hilário et al. (2024) showed that using GenAI significantly enhances penetration testing efficiency by analyzing large datasets and generating test scenarios quickly.

Table 3 summarizes the validation methods and results from these studies.

Study	Validation Method	Results
McKinnel et al. (2019)	Meta-analysis of empirical studies	Improved attack planning with PDDL-based algorithms; scalability issues.
Kasim et al. (2020)	Trained adversarial AI system	Highly effective AI agents in exploring vulnerabilities.
Confido et al. (2022)	RL applied to simulated network attacks	Significant reduction in required human actions.
Saber et al. (2023)	Systematic review of automation techniques	Improved efficiency but challenges in specific scenarios.
Hilário et al. (2024)	GenAI in penetration testing	Faster, more precise detection and improved report quality.

Conclusion

As evaluated, information security is a critical field within information technology, and penetration testing techniques can be employed to prevent malicious attackers. Artificial Intelligence (AI) can serve as a valuable ally in achieving success in mitigating these threats.

The analysis of the studies concluded that applying AI to the phases of penetration testing significantly enhances the efficiency and effectiveness of these tests. Technologies such as Reinforcement Learning (RL) and Generative AI (GenAI), as demonstrated in the research questions, are effective in automating critical processes across pentest phases: Information Gathering, Threat Modeling, Vulnerability Analysis, Exploitation, and Post-Exploitation. This approach reduces human intervention and optimizes the necessary actions to compromise vulnerable systems.

Future research should focus on exploring new AI techniques to further improve and optimize processes aimed at mitigating emerging vulnerabilities. It is essential to develop tools and algorithms that enable AI to support professionals addressing these cybersecurity challenges. Furthermore, the applicability of AI should be assessed across various scenarios, considering the diversity of new technologies and the complexity of data transmission.

References

G. Weidman, Testes de Invasão. No Starch Press, 2014.
W. Allsopp, Advanced Penetration Testing: Hacking the World’s Most Secure Networks. Wiley, 1st ed., 2017.
P. Engebretson, The Basics of Hacking and Penetration Testing, 2nd ed. Elsevier, 2013.
A. Confido, E. V. Ntagiou, M. Wallum, “Reinforcing Penetration Testing Using AI,” in Proc. IEEE, 2022.
E. Hilario, S. Azam, J. Sundaram, K. I. Mohammed, B. Shanmugam, “Generative AI for Pentesting: The Good, The Bad, The Ugly,” in Proc. Scopus, 2024.
B. Kitchenham, S. Charters, “Guidelines for Performing Systematic Literature Reviews in Software Engineering,” Keele University and Durham University Joint Report, Tech. Rep. EBSE-2007-01, 2007.
S. Kasim, S. S. Samadi, N. Valliani, L. Watkins, N. K. Wong, A. Rubin, “Cybersecurity as a Tic-Tac-Toe Game Using Autonomous Forwards (Attacking) And Backwards (Defending) Penetration Testing in a Cyber Adversarial Artificial Intelligence System,” in IEEE Xplore, 2020.
V. Saber, D. ElSayad, A. M. Bahaa-Eldin, Z. T. Fayed, “Automated Penetration Testing: A Systematic Review,” in IEEE Xplore, 2023.
D. R. McKinnel, T. Dargahi, A. Dehghantanha, K. K. R. Choo, “A Systematic Literature Review and Meta-Analysis on Artificial Intelligence in Penetration Testing and Vulnerability Assessment,” ScienceDirect, 2019.
V. Sharma et al., “GPT-3: Advancing AI Capabilities in Natural Language Processing for Cybersecurity Threat Intelligence,” in IEEE Access, 2021.
L. Xiao et al., “A Survey on Reinforcement Learning for Cybersecurity,” in IEEE Transactions on Cybernetics, 2022.

To access the full article, check the links below:

Official publication page at the ENCOM 2024 event:
National Conference on Communications, Networks, and Information Security (ENCOM)
Applicability of Artificial Intelligence in Penetration Testing: A Systematic Mapping
PDF of the article:
Download the full article

Articles, Academic

Artificial intelligence Pentesting

This post is licensed under CC BY 4.0 by the author.