Research

Artificial Intelligence and Malware Detection

The term 'artificial intelligence' is no longer a mere buzzword in the antivirus industry. But does it mean it is the holy grail for malware detection or the perfect weapon against malware developers? Not quite.

First Published 14th December 2021  |  Latest Update 16th January 2022

Artificial Intelligence and Malware Detection

When it comes to AI-powered antivirus, is 'good' good enough? 

5 min read  |  Reflare Research Team

History lesson

It was in 1971 when Bob Thomas, an employee of Bolt Beranek and Newman Inc. (BBN), wrote what is generally accepted as the first known self-replicating program. The program known as Creeper, however, was not created to be harmful. It was an experimental computer program designed to move between DEC PDP-10 mainframe computers running the TENEX operating system via ARPANET and would output the message “I’M THE CREEPER; CATCH ME IF YOU CAN” to the teletype of the infected machine.

To catch Creeper, another self-replicating program called Reaper was created by another BBN employee -- Ray Tomlison. Like Creeper, it was not harmful and its only purpose was to delete Creeper on the infected computers. Despite itself being what we today would call a computer virus, some consider Reaper to be the first antivirus program.

The term computer 'virus' was not used to describe self-replicating programs like Creeper until 1983 when Fred Cohen, an American computer scientist, coined that term in his PhD thesis to describe programs that “affect other computer programs by modifying them in such a way as to include a (possibly evolved) copy of itself”.

While it is unclear who actually created the first antivirus product, 1987 was the year when G-Data, McAfee, and NOD32 introduced their antivirus software to the public. The G-Data antivirus program was developed for the Atari ST platform while McAfee and NOD32 products were created for Microsoft DOS operating system.

Antivirus in the early days

The first-generation antivirus technologies were quite simple. The virus scanners would scan programs to look for known signatures or sequences of bytes that would indicate infection. If they matched a signature, the antivirus software would take the appropriate actions such as deleting the infected file.

Signature-based detection methods rely on all the copies of the virus to be identical. Therefore, even the slightest change made to the original viruses would cause the malicious programs not to be detected by the antivirus software.

The risk of heuristic and behavioural-based detection

Unfortunately, most modern malware is either self-modifying code or has mechanisms that would make antivirus scanners that rely on signatures useless against them. Signature-based solutions also have the problem of not being scalable in the internet age where hundreds if not thousands of new malware are distributed every day, as they would require the signature database to be constantly updated with new entries. This means it would not only consume a lot of storage space on users' computers but the products would also take longer to scan for viruses.

To stand a chance against self-modifying malware, new antivirus engines utilising different detection strategies were born. Unlike first-generation antivirus scanners that only look for static patterns associated with malware in files or memory, these detection engines would analyse the programs’ code and their run-time behaviours to decide whether they are malicious or harmless.

These decisions are made based on a rulebook created by human experts. In other words, these antivirus engines are expert systems that mimic malware analysts' decision-making process.

While heuristic-based antivirus products solve many of the problems associated with signature-based solutions, they are far from perfect. For one thing, they rely on experts to first analyze malware and come up with rules that would detect the malware and its variants. But today, antivirus companies get hundreds if not thousands of unique malware every day and it would take thousands of man-hours to analyze them all.

Let us not forget that malware writers would also constantly try to find new ways to bypass antivirus products including reverse engineering the detection engines to figure out the implemented heuristic rules in order to defeat them.

For this reason, antivirus companies are always on the move looking for a way to stay ahead of their adversaries.

The golden age of artificial intelligence

“AI is the new electricity” - Andrew Ng, Director of Stanford Artificial Intelligence Lab (SAIL).

In 2012, a new antivirus company founded by a former employee of McAfee emerged on the scene. The company, Cylance Inc., is not just another antivirus company, though. It claimed to be able to stop unknown and previously undetectable malware by utilizing sophisticated AI and mathematics. Its antivirus product did not rely on human experts to first analyze the malware and come up with heuristic rules. Instead, Cylance fed its machine learning algorithm terabytes of data so that it would be able to tell if a program is benign or malicious.

The most outrageous claim made by Cylance however was that its antivirus product would not require ongoing updates like traditional antivirus products. This is especially interesting considering the machine learning model used by its product was less than 40MB in size and could easily fit into memory.

If your first thought was this sounds nothing more than just a marketing gimmick, you wouldn’t be alone. Cylance did face a lot of scepticism in the beginning. However, they are far from being a company selling snake oil. As a matter of fact, the company had many happy customers and was already making over a hundred million in revenue before it was acquired by BlackBerry in 2019 for 1.4 billion USD.

Today, the term artificial intelligence is no longer a mere buzzword in the antivirus industry. But does it mean it is the holy grail for malware detection? The perfect weapon against malware developers?

Not quite.

Good - not great

AI-powered antivirus products are not without weaknesses and can be defeated. Cylance experienced this first hand in 2019 when an Australian-based cybersecurity company, SkyLight Cyber, demonstrated how its AI-powered engine could not only be bypassed, but also tricked into identifying a malicious program as a trusted software - using a technique that would only work against AI-based solutions, but not traditional antivirus engines.

This reminds us how cybersecurity is a vicious cycle of a cat-and-mouse game. No matter what we use and no matter how cutting edge it is - it will never be perfect and given enough time, someone will find a way to defeat it. In fact, more often than not, the adversaries would use the same technology we use to defend against them to beat us.

Just like how threat actors these days are taking advantage of advancements in machine learning to help create and design better malware that could evade the antivirus products we use.

To learn more and stay up-to-date with the latest in malware detection and other security topics, consider subscribing to Reflare's Research Newsletter. Additionally, check out some of our reports below.

Subscribe by email