Gartner ITxpo Session Preview: Best Practices with Machine Learning

Will you be going to the upcoming Gartner Symposium/ITxpo conference in Orlando, Florida in October? If you are, please stop by, attend my session, and say hello as I will be presenting on “Best Practices with Machine Learning in Security Analytics”. For those not attending, here is a brief overview and a little bit about why I was compelled to present on this topic.

The abstract for the talk does well in getting to the heart of the matter:

The industry is buzzing around machine learning. Despite the hype, machine learning can be useful for information security. If every security vendor is claiming the use of machine learning, how do you separate the marketing from the technical value these products deliver? You should not have to become a data scientist to perform this evaluation. Join us as we debunk the hype, define machine learning and outline how it can deliver more effective security and not just hype!

I’ll touch on a few things here without spoiling it for those who will be at the event.

Security vendors should be very transparent about how they go about coming to an analytical outcome. The data science is already difficult to understand, and doesn’t make trivial conclusions like movie recommendations; it might recommend that an executive’s computer be quarantined or part of a branch office be taken off line. These non-trivial assertions need to “show their work” and you as a customer need to understand the technology enough to develop a level of trust in its accuracy. Vendors can do this by publishing papers on their techniques, opensource code as the reference implementations– black-box secrecy does no one any good. Beware of any vendor who cannot explain how their security analytics works end-to-end (from telemetry being observed to the outcome).

Ask where they have applied machine learning in the product and why the machine learning is superior for the task. This will likely get you a blank stare or they will start to explain the method like ‘Supervised Learning is applied in this area because we have a solid basis for the domain we can train on’. Wonderful, how often does this training take place? I’m trying to get you beyond the question ‘Does your product use machine learning?’ to a place where you can really qualify the technology without being a data scientist.

The last thing I will leave you with is around the subject of efficacy. Security analytics need to deliver high fidelity outcomes in order to be useful; low fidelity will end in frustration. So many vendors think that the user of their system deeply understands the science, asking questions around false positives, and true positives, when all they need to do is ask if it is helpful. Analytical outcomes are much more about semantics than they are about syntax, so measuring the utility is critical. ‘Was this alert helpful?’ Yes or No? You have bypassed all expert knowledge and asked your customer something every customer can answer! You may not like the answer, but nonetheless, it is the right question to ask.

20 years ago it was all about lists: lists that described bad things (blacklists), good things (whitelists), but today we have advanced analytics to a point where we can categorize behaviors. Please note that in this journey, we are not forgetting the older methods that worked, we are adding to our toolbox and creating ways to have them all work together in new ways. Any attacker on any given day will be able to evade a single detection method, but not a diverse set of detection techniques working in concert with one another. Somedays we win, somedays we lose, but everyday let us try and make it harder for our attackers to do what they do.

Hope to see you in Orlando at this Gartner event. Bring your smiles, your questions, and your sunscreen.