Demystifying: Machine Learning in Endpoint Security
Deciding on a new endpoint security vendor is tough. From your very first search, you’ll get a lot of overused terms thrown at you – machine learning, artificial intelligence, next-generation antivirus, fileless malware protection, threat hunting – the list goes on. The headlines can sound tempting and cutting edge, or make you feel crazy enough to stop your search altogether and stick with your current product. It quickly becomes difficult to distinguish one vendor from the other, much less choose the right endpoint security solution to protect your network against another term you hear a lot about – advanced threats.
After hearing a lot of well-intentioned, but misguided questions at countless trade shows and customer meetings, we decided it was important to demystify some of the terms you hear about the most. Because beyond their surface level appearance of “marketing fluff,” the concepts that these terms represent are actually very important. A lot of them are features and capabilities you should demand in the solution you ultimately invest in. But if you don’t understand what they really mean, or what the net benefit is, you could be buying into an incomplete story, a tool that doesn’t provide exactly what you need, or even worse – something you can’t use.
Which brings us to what is quite possibly the mostoverused term in endpoint security today: machine learning.
Like every other endpoint security vendor, we receive a lot of questions about machine learning. The question typically sounds something like “does Cisco use machine learning to catch malware?” To which we respond:
Machine learning alone doesn’t catch malware.
Put very simply, machine learning in endpoint security refers to the use of an algorithm to train your endpoint security solution to “learn” to identify malicious files and activity based on attributes of previous malicious files it has seen.
And yes, Cisco’s AMP for Endpoints does that. But we’ll get to that later.
What is machine learning?
While machine learning can, and should, be used to increase your detection rate and save you time, it’s important to note that machine learning alone doesn’t catch malware. To understand why this is the case, we have to understand how machine learning works in the first place. Ruba Borno, Cisco’s Vice President of Growth Initiatives and Chief of Staff for the office of the CEO, explained it well in her recent blog:
“With machine learning we can feed massive amounts of data into the algorithm, then the machine determines the best course of action in the real world (instead of having experts code rules for a machine to follow when they let it operate in the world).
Machines learn by seeing a large number of versions of something. For example, to teach a machine to know the difference between a cat and a dog, you need to show it a lot of pictures, with views of cats and dogs from the front, back, side, and above. With machine learning, the machine with the most “data” on cats and dogs will develop the best way to tell the difference on its own.”
Machine learning in endpoint security
Andrew Ng, one of the leading experts on machine learning and artificial intelligence, identified the “Achilles’ heel” of today’s supervised machine learning capabilities:
“It requires a huge amount of data. You need to show the system a lot of examples of both A and B. For instance, building a photo tagger requires anywhere from tens to hundreds of thousands of pictures (A) as well as labels or tags telling you if there are people in them (B). Building a speech recognition system requires tens of thousands of hours of audio (A) together with the transcripts (B).”
So, in order to build a machine learning algorithm that can most accurately distinguish malicious from non-malicious files, you must feed and train it with a very broad set of known malware. In other words, the more malware your machine learning algorithm sees, the smarter it becomes.
Let’s talk numbers
Cisco has spent the last 30 years building the backbone of the internet. Today, we block 20B threats per day, or 7 trillion per year. These threats are fed into our machine learning engines where they are dissected and analyzed to train the algorithms. AMP for Endpoints has a dedicated Research & Efficacy team that leverages all of this data to improve protection, and continually drive down the time to detection. These researchers and the threats they analyze are used to train our machine learning algorithms in order to better protect against new and emerging threat types. While the point of machine learning is that your tool can get smarter on its own over time, when you have the industry’s brightest minds constantly training it, your tool gets that much smarter.
This gets easier to understand when we think about the very basics of endpoint security. An endpoint security tool can categorize files and other observables into three categories:
- Things it has seen and can identify as safe
- Things it has seen and can identify as unsafe or malicious
- Things it has never seen, and therefore cannot identify as safe or malicious
When your solution uses machine learning to help categorize files, it should reach its decision faster and with greater accuracy. If the algorithm has been trained by enough good data, it should identify new or unknown threats with relative ease. The power lies in the amount of data being fed into your models, so the more malware your machine learning algorithm sees, the more capable your endpoint security solution becomes at identifying malware attempting to enter your network. This should all be done automatically – if your machine learning tool generates alerts and makes you decide upon the disposition of the file, you’re not experiencing greater efficacy or efficiency.
Part of a solution, not the solution
We like to mention early on in any machine learning conversation that while this capability is vastly improving the time-to-detection, efficiency, and efficacy of our solutions, it isn’t an end-all-be-all tool. Machine learning should be thought of as partof a solution, not the solution. There will always be new types of malware with never-before-seen characteristics. When that malware tries to enter your environment, it has a good chance of making it past a tool that relies only on machine learning.
At this point you should ask yourself – how do you stop the threats your machine learning misses? Can you even see them? Once a threat is inside your environment, machine learning capabilities are of no help and you’ll wish you had a more complete tool that could identify malicious behaviors your machine learning algorithm hasn’t seen yet.
And what about fileless malware? There’s a reason we’ve seen an increase in the use of this approach to infiltrate your network. While machine learning algorithms can be trained to distinguish malicious from non-malicious files, they aren’t much help when there’s no file to analyze, or even do run time analysis. To protect against non-traditional malware types, you need a layered approach to protection, detection, and response. There’s just no silver bullet. Being able to block as many threats up front as possible is critical, but we all know that it’s the last 1% that will land you in the headlines.
Cisco uses a layered approach to security, especially at the endpoint. Cisco AMP for Endpoints, our next-generation endpoint security solution, uses machine learning as one of over a dozen detection and protection techniques to prevent you from being breached. A few of AMP’s other detection and protection techniques include:
- Malicious activity protection to continuously monitor the behavior of files that are executing, evaluate whether or not this behavior is legitimate, and kill processes that should not be executed by a file
- Our exploit prevention engine that monitors everything running in memory to ensure legitimate applications and processes are not being leveraged to deliver malware
- AMP’s built in signature-based detection engine to quickly answer if what we are seeing is a known malicious file
Continuing the layered approach, AMP for Endpoints is just one of our products that uses machine learning. You’ll also find machine learning capabilities built into Cisco Stealthwatch and Cognitive Threat Analytics. To learn more about exactly how Cisco’s security products use machine learning, stay tuned for part 2 of this blog coming soon.
To test the features (including, but definitely not limited to, machine learning) in AMP for Endpoints for yourself, check out our free trial.