Share this
AI Cyber Agents Advance Vulnerability Discovery
by Reflare Research Team on Oct 7, 2025 3:22:28 PM
Once thought impossible, AI agents are now finding zero-day vulnerabilities in real software. Now, they are exposing critical flaws that traditional tools have missed, which is pushing security from reactive fixes to proactive defence.
Zero-day, special delivery.
How AI Agents Think Differently
Traditional cybersecurity tools follow predetermined rules and patterns. They scan for known signatures, run static analysis based on predefined vulnerabilities, or fuzz applications with random inputs, hoping to trigger crashes. These approaches have served the industry well, but they're fundamentally limited by human knowledge and imagination.
AI cyber agents operate on an entirely different principle. They can reason about code, understand complex relationships across software ecosystems, and adapt their strategies based on what they discover. When an AI agent analyses a codebase, it's not just looking for patterns that humans have previously identified as dangerous. It's actually thinking through how different components interact, where edge cases might exist, and how an attacker might chain together seemingly harmless functions to create exploitable conditions.
This capability has already produced remarkable results. Google's AI-enhanced OSS-Fuzz identified a vulnerability in OpenSSL that had existed for two decades. The flaw, CVE-2024-9143, is an out-of-bounds memory write bug that can cause application crashes or enable remote code execution.
What makes this discovery particularly significant is that it "wouldn't have been discoverable with existing fuzz targets written by humans". Traditional fuzzing approaches had been running against OpenSSL for years without finding this bug, but the AI agent spotted it by understanding the code in ways that human-written tests simply couldn't.
Some organisations are pushing these capabilities even further. ZeroPath has been combining deep program analysis with adversarial AI agents, using an approach that has uncovered numerous critical vulnerabilities in production systems that traditional Static Application Security Testing tools couldn't identify. They use Monte Carlo Tree Search techniques adapted for cybersecurity applications, allowing the AI to intelligently navigate through decision spaces rather than randomly trying different inputs.
The system learns from each attempt and focuses on the most promising areas for vulnerability discovery. Protect AI, on the other hand, has developed specialised agents that analyse functions, classes, and code snippets to build comprehensive pictures of potential vulnerabilities, providing confidence scores for their findings and detailed explanations of potential exploitation paths.
The level of autonomy these systems can achieve is also remarkable. Consider what happened when researchers discovered a vulnerability in wolfSSL. The entire discovery process required no manual intervention beyond setting up the project and typing "cifuzz spark" into the command line. The AI agent handled everything else - identifying potential targets, generating test cases, running them, analysing results, and confirming the vulnerability. This kind of end-to-end autonomous discovery was unimaginable just a few years ago.
The Scale of Discovery
The impact of AI-powered vulnerability discovery becomes clear when looking at recent results. Overall, Google's AI-enhanced OSS-Fuzz has identified 26 new vulnerabilities in open-source projects, including the OpenSSL vulnerability we mentioned above. While OSS-Fuzz as a whole has helped identify thousands of vulnerabilities since 2016 using traditional methods, the recent integration of AI capabilities is finding vulnerabilities that previous approaches missed entirely.
The sophistication of these discoveries is increasing as well. AI agents are moving beyond simple buffer overflows and memory corruption bugs to identify complex logic flaws, race conditions, and subtle authentication bypasses that require a deep understanding of application behaviour.
Technical Challenges and Limitations
Despite their impressive capabilities, current AI cyber agents face several important limitations. Context window restrictions in large language models limit the amount of information an AI can process when analysing code. This affects the scope of analysis that agents can perform in single operations, sometimes requiring complex codebases to be analysed in smaller chunks.
There's also the challenge of consistency. As researchers at Protect AI noted, "because LLMs aren't deterministic, one can run the tool multiple times on the exact same project and get different results". This variability can complicate validation and repeatability requirements, particularly in enterprise environments where consistent results are critical for compliance and audit purposes.
Current AI agents also tend to be specialised rather than generalised. Applications are typically trained to identify specific types of flaws and cannot identify additional vulnerability types without retraining. While this specialisation can lead to very effective detection of particular vulnerability classes, it means organisations often need multiple AI agents with different capabilities to achieve comprehensive coverage.
Responsible Development
As AI cyber agents become more powerful and widespread, the cybersecurity community has placed increasing emphasis on responsible development and deployment practices. All major research initiatives prioritise responsible disclosure of discovered vulnerabilities, ensuring that defensive capabilities strengthen without creating additional risks for the broader ecosystem.
This responsibility extends to the development of the AI systems themselves. The industry is developing frameworks for AI security assessment and validation, recognising that the tools used to secure systems must themselves be secure and trustworthy. This includes establishing standards for AI agent behaviour, implementing checks against misuse, and ensuring the technology primarily benefits defenders rather than attackers.
Google's approach with Big Sleep exemplifies this responsible development philosophy. When their AI agent discovered a zero-day vulnerability in SQLite, they immediately disclosed it to the SQLite team, which patched the issue the same day. This collaborative approach ensures that AI-powered vulnerability discovery strengthens overall security rather than creating new risks.
The Road Ahead
The transformation we're witnessing represents more than incremental improvement in cybersecurity tools, but the most significant transformation since the introduction of automated scanning tools. The question is no longer whether AI will transform cybersecurity, but how quickly organisations can adapt to this new reality while maintaining appropriate governance and oversight.
Share this
- September 2025 (1)
- August 2025 (1)
- July 2025 (1)
- June 2025 (1)
- May 2025 (1)
- April 2025 (1)
- March 2025 (1)
- February 2025 (1)
- January 2025 (1)
- December 2024 (1)
- November 2024 (1)
- October 2024 (1)
- September 2024 (1)
- August 2024 (1)
- July 2024 (1)
- June 2024 (1)
- April 2024 (2)
- February 2024 (1)
- January 2024 (1)
- December 2023 (1)
- November 2023 (1)
- October 2023 (1)
- September 2023 (1)
- August 2023 (1)
- July 2023 (1)
- June 2023 (2)
- May 2023 (2)
- April 2023 (3)
- March 2023 (4)
- February 2023 (3)
- January 2023 (5)
- December 2022 (1)
- November 2022 (2)
- October 2022 (1)
- September 2022 (11)
- August 2022 (5)
- July 2022 (1)
- May 2022 (3)
- April 2022 (1)
- February 2022 (4)
- January 2022 (3)
- December 2021 (2)
- November 2021 (3)
- October 2021 (2)
- September 2021 (1)
- August 2021 (1)
- June 2021 (1)
- May 2021 (14)
- February 2021 (1)
- October 2020 (1)
- September 2020 (1)
- July 2020 (1)
- June 2020 (1)
- May 2020 (1)
- April 2020 (2)
- March 2020 (1)
- February 2020 (1)
- January 2020 (3)
- December 2019 (1)
- November 2019 (2)
- October 2019 (3)
- September 2019 (5)
- August 2019 (2)
- July 2019 (3)
- June 2019 (3)
- May 2019 (2)
- April 2019 (3)
- March 2019 (2)
- February 2019 (3)
- January 2019 (1)
- December 2018 (3)
- November 2018 (5)
- October 2018 (4)
- September 2018 (3)
- August 2018 (3)
- July 2018 (4)
- June 2018 (4)
- May 2018 (2)
- April 2018 (4)
- March 2018 (5)
- February 2018 (3)
- January 2018 (3)
- December 2017 (2)
- November 2017 (4)
- October 2017 (3)
- September 2017 (5)
- August 2017 (3)
- July 2017 (3)
- June 2017 (4)
- May 2017 (4)
- April 2017 (2)
- March 2017 (4)
- February 2017 (2)
- January 2017 (1)
- December 2016 (1)
- November 2016 (4)
- October 2016 (2)
- September 2016 (4)
- August 2016 (5)
- July 2016 (3)
- June 2016 (5)
- May 2016 (3)
- April 2016 (4)
- March 2016 (5)
- February 2016 (4)