Detecting and Labelling Unknown Malicious Files with Machine Learning

By CIOReview | Thursday, June 7, 2018

One of the recent studies by Trend Micro researches presented that more than 83 percent of all downloaded files are unknown or unclassified, even after two years they were first observed in the wild. With most malware threats appearing from software download events, they successively built a human-readable machine learning system which classifies the unknown files into either harmless or malicious in nature.

With a dataset of 3 million anonymized web-based software download events procured in a span of seven months, they studied the events. A machine learning system was then created to automatically develop rules of detection based on observations of file information and features. The intelligent system analyzed information in each downloaded file—signer, certification authority, and packer of the downloaded file and the downloading process, class of the downloading process, and the popularity of the download domain. By generating 1,500 detection rules per month, the developed machine learning system reduced the number of unknown downloads by 28 percent.

With the machine learning system, Trend Micro researchers were able to successfully label 28.3 percent of 436,829 previously unknown files—a 233 percent increase in comparison to the available ground truth.

While newer threats continue to emerge, there exists a dire need for advances in machine learning specifically for cybersecurity solutions. Machine Learning is not a cybersecurity silver bullet although it is exceedingly efficient in identifying and analyzing unknown files and at the same time identifying new ransomware types and malware variants. The technology is stronger when it’s a part of a multilayered approach to security, like the Trend Micro™ XGen™ security which helps secure systems, with functionalities like web/URL filtering, behavioral analysis, and custom sandboxing. The XGen suite of security solutions has the ability to empower businesses by protecting against such threats that are today able to bypass traditional controls and exploit vulnerabilities.