Following part three of our Big Data in Security series on graph analytics, I’m joined by expert data scientists Dazhuo Li and Jisheng Wang to talk about their work in developing an intelligent anti-spam solution using modern machine learning approaches on Hadoop.
What is ARS and what problem is it trying to solve?
Dazhuo: From a high-level view, Auto Rule Scoring (ARS) is the machine learning system for our anti-spam system. The system receives a lot of email and classifies whether it’s spam or not spam. From a more detailed view, the system has hundreds of millions of sample email messages and each one is tagged with a label. ARS extracts features or rules from these messages, builds a classification model, and predicts whether new messages are spam or not spam. The more variety of spam and ham (non-spam) that we receive the better our system works.
Jisheng: ARS is also a more general large-scale supervised learning use case. Assume you have tens (or hundreds) of thousands of features and hundreds of millions (or even billions) of labeled samples, and you need them to train a classification model which can be used to classify new data in real time.
Read More »
Tags: analytics, ARS, auto rule scoring, Big Data, Cisco, database, email, Hadoop, ham, innovation, Intelligence, offline learning, online learning, operations, security, spam, TRAC
Following part two of our Big Data in Security series on University of California, Berkeley’s AMPLab stack, I caught up with talented data scientists Michael Howe and Preetham Raghunanda to discuss their exciting graph analytics work.
Where did graph databases originate and what problems are they trying to solve?
Michael: Disparate data types have a lot of connections between them and not just the types of connections that have been well represented in relational databases. The actual graph database technology is fairly nascent, really becoming prominent in the last decade. It’s been driven by the cheaper costs of storage and computational capacity and especially the rise of Big Data.
There have been a number of players driving development in this market, specifically research communities and businesses like Google, Facebook, and Twitter. These organizations are looking at large volumes of data with lots of inter-related attributes from multiple sources. They need to be able to view their data in a much cleaner fashion so that the people analyzing it don’t need to have in-depth knowledge of the storage technology or every particular aspect of the data. There are a number of open source and proprietary graph database solutions to address these growing needs and the field continues to grow.
Read More »
Tags: analytics, Big Data, Cisco, database, Gremlin, InfiniteGraph, innovation, Intelligence, NoSQL, operations, security, Titan, TRAC, TRAC Big Data Analysis
Recently I had an opportunity to sit down with the talented data scientists from Cisco’s Threat Research, Analysis, and Communications (TRAC) team to discuss Big Data security challenges, tools and methodologies. The following is part one of five in this series where Jisheng Wang, John Conley, and Preetham Raghunanda share how TRAC is tackling Big Data.
Given the hype surrounding “Big Data,” what does that term actually mean?
John: First of all, because of overuse, the “Big Data” term has become almost meaningless. For us and for SIO (Security Intelligence and Operations) it means a combination of infrastructure, tools, and data sources all coming together to make it possible to have unified repositories of data that can address problems that we never thought we could solve before. It really means taking advantage of new technologies, tools, and new ways of thinking about problems.
Read More »
Tags: analytics, API, Big Data, Cisco, database, Hadoop, HDFS, innovation, Intelligence, java, mapreduce, NoSQL, operations, security, Shark, Spark, SQL, telemetry, TRAC, TRAC Big Data Analysis
Last week, I shared basic enablement, intelligence, engagement and measurement practices. This week’s presentation focuses on some advanced practices in the areas of intelligence, engagement, advocacy and measurement. By no means is this list complete so please feel free to add your two cents in the Comment box below. The more we share, the more we can influence how companies and even industries are viewing and adopting social media. Collectively, we can shape its evolution. So please, share away!
And without further ado, here’s another chapter from my unwritten book in slide deck format: Read More »
Tags: advanced, advocacy, best practices, community, crm, data, engagement, how to, Intelligence, Listening, measurement, Plan, ROI, social media, strategy
In June, I attended the Gartner Security Summit in Washington, D.C. where I was asked by quite a few security executives, “My network folks just bought ISE, but what is ISE and what type of security does it provide?” Fast forward to July, and I wish I had this SANS review on ISE to offer a month earlier. (SANS, as many security professionals know, is a highly regarded organization on IT security and cyber security.) Read More »
Tags: blackhat, context awareness, cyber security, ESG, Gartner, Intelligence, ISE, Network World, SANS, secure access, security, threats