7 votes

Project Zero: Using large language models to catch vulnerabilities in real-world code

Posted November 2, 2024 by skybrian

Tags: security, google, coding, artificial intelligence, language models.large, sqlite, vulnerabilities, blogger.project zero

https://googleprojectzero.blogspot.com/2024/10/from-naptime-to-big-sleep.html

Link information

This data is scraped automatically and may be incorrect.

Authors: Google Project Zero
Word count: 1866 words

2 comments

skybrian (OP)
November 2, 2024
Link
From the article: The approach they use is to give the LLM a previously-patched security vulnerability and ask it to look for variants: And they are just getting started:

From the article:

Today, we're excited to share the first real-world vulnerability discovered by the Big Sleep agent: an exploitable stack buffer underflow in SQLite, a widely used open source database engine. We discovered the vulnerability and reported it to the developers in early October, who fixed it on the same day. Fortunately, we found this issue before it appeared in an official release, so SQLite users were not impacted.

We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software. Earlier this year at the DARPA AIxCC event, Team Atlanta discovered a null-pointer dereference in SQLite, which inspired us to use it for our testing to see if we could find a more serious vulnerability.

The approach they use is to give the LLM a previously-patched security vulnerability and ask it to look for variants:

By providing a starting point – such as the details of a previously fixed vulnerability – we remove a lot of ambiguity from vulnerability research, and start from a concrete, well-founded theory: "This was a previous bug; there is probably another similar one somewhere".

And they are just getting started:

Our project is still in the research stage, and we are currently using small programs with known vulnerabilities to evaluate progress. Recently, we decided to put our models and tooling to the test by running our first extensive, real-world variant analysis experiment on SQLite.

7 votes
pete_the_paper_boat
November 2, 2024
Link
I'm curious how this affects bug bounties

I'm curious how this affects bug bounties

2 votes