Cisco Blogs


Cisco Blog > Security

Big Data in Security – Part V: Anti-Phishing in the Cloud

TRACIn the last chapter of our five part Big Data in Security series, expert Data Scientists Brennan Evans and Mahdi Namazifar join me to discuss their work on a cloud anti-phishing solution.

Phishing is a well-known historical threat. Essentially, it’s social engineering via email and it continues to be effective and potent. What is TRAC currently doing in this space to protect Cisco customers?

Brennan: One of the ways that we have traditionally confronted this threat is through third-party intelligence in the form of data feeds. The problem is that these social engineering attacks have a high time dependency. If we solely rely on feeds, we risk delivering data to our customers that may be stale so that solution isn’t terribly attractive.  This complicates another issue with common approaches with a lot of the data sources out there:  many attempt to enumerate the solution by listing compromised hosts and  in practice each vendor seems to see just a small slice of the problem space, and as I just said, oftentimes it’s too late.

We have invested a lot of time in looking at how to avoid the problem of essentially being an intelligence redistributor and instead look at the problem firsthand using our own rich data sources -- both external and internal - and really develop a system that is more flexible, timely, and robust in the types of attacks it can address.

Mahdi: In principle, we have designed and built prototypes around Cisco’s next generation phishing detection solution.  To address the requirements for both an effective and efficient phishing detection solution, our design is based on Big Data and machine learning.  The Big Data technology allows us to dig into a tremendous amount of data that we have for this problem and extract predictive signals for the phishing problem. Machine learning algorithms, on the other hand, provide the means for using the predictive signals, captured from historical data, to build mathematical models for predicting the probability of a URL or other content being phishing.

Phishing

Read More »

Tags: , , , , , , , , , , , ,

Congratulations to 2013 IEEE-SA International Award Recipient Andrew Myles

ieeeEarlier this week, the IEEE Standards Association (IEEE-SA) announced the winners of the 2013 IEEE-SA Awards to honor standards development contributions. We are pleased to announce that Andrew Myles, Engineering Technical Lead at Cisco has been awarded the IEEE 802 SA International award for his extraordinary contribution to establishing IEEE-SA as a world-class leader in standardization.  Andrew has long been involved in IEEE-SA and led a long term initiative (2005-2013) in IEEE 802 to defend and promote IEEE 802 standards globally.

We want to congratulate Andrew on this tremendous recognition. The work of Andrew and others  contributors develop and promote high quality, efficient and effective IEEE standards.  This enables the Internet and the supporting network components to be the premiere platforms for innovation and borderless commerce they are today. These standards in turn are reflected in our products and solutions for our customers.  As we develop technological innovation for our customers, in parallel, we continue to drive global standards deployment. The results are the best innovative solutions that can solve and better our customers’ network environments. Read More »

Tags: , , , , , , , , ,

Big Data in Security – Part IV: Email Auto Rule Scoring on Hadoop

TRACFollowing part three of our Big Data in Security series on graph analytics, I’m joined by expert data scientists Dazhuo Li and Jisheng Wang to talk about their work in developing an intelligent anti-spam solution using modern machine learning approaches on Hadoop.

What is ARS and what problem is it trying to solve?

Dazhuo: From a high-level view, Auto Rule Scoring (ARS) is the machine learning system for our anti-spam system. The system receives a lot of email and classifies whether it’s spam or not spam. From a more detailed view, the system has hundreds of millions of sample email messages and each one is tagged with a label. ARS extracts features or rules from these messages, builds a classification model, and predicts whether new messages are spam or not spam. The more variety of spam and ham (non-spam) that we receive the better our system works.

Jisheng: ARS is also a more general large-scale supervised learning use case. Assume you have tens (or hundreds) of thousands of features and hundreds of millions (or even billions) of labeled samples, and you need them to train a classification model which can be used to classify new data in real time.

Spam

Read More »

Tags: , , , , , , , , , , , , , , , ,

Big Data in Security – Part III: Graph Analytics

TRACFollowing part two of our Big Data in Security series on University of California, Berkeley’s AMPLab stack, I caught up with talented data scientists Michael Howe and Preetham Raghunanda to discuss their exciting graph analytics work.

Where did graph databases originate and what problems are they trying to solve?

Michael: Disparate data types have a lot of connections between them and not just the types of connections that have been well represented in relational databases. The actual graph database technology is fairly nascent, really becoming prominent in the last decade. It’s been driven by the cheaper costs of storage and computational capacity and especially the rise of Big Data.

There have been a number of players driving development in this market, specifically research communities and businesses like Google, Facebook, and Twitter. These organizations are looking at large volumes of data with lots of inter-related attributes from multiple sources. They need to be able to view their data in a much cleaner fashion so that the people analyzing it don’t need to have in-depth knowledge of the storage technology or every particular aspect of the data. There are a number of open source and proprietary graph database solutions to address these growing needs and the field continues to grow.

Graph Read More »

Tags: , , , , , , , , , , , , ,

A Room with a View (of Crucial Big Data Insights)

What’s the problem with Big Data? You guessed right — it’s BIG.

Big Data empowers organizations to discern patterns that were once invisible, leading to breakthrough ideas and transformed business performance. But there is simply so much of it, and from such myriad sources — customers, competitors, mobile, social, web, transactional, operational, internal, external, structured, and unstructured — that, for many organizations, Big Data is overwhelming. The torrents of data will only increase as the Internet of Everything spreads its ever-expanding wave of connectivity, from 10 billion connected things today to 50 billion in 2020.

So, how can organizations learn to use all of that data?

The key lies not in simply having access to enormous data streams. Information must be filtered for crucial, actionable insights, and presented to the right people in a visualized, comprehensible form. Only then will Big Data transform business strategies and decisions. In effect, Big Data must be made small.

However, as McKinsey & Co. reported, many organizations don’t have enough data scientists, much less ones who understand the business well enough to draw conclusions. The trick is to get the scientists together with the experts who understand the business levers driving the organization. Put them in a room with the right tools, and watch the synergy fly.

But what sort of a room?

big_data_room_10_new

Read More »

Tags: , , , , , , , , , , ,