Cisco Blogs

Cisco Blog > Security

Big Data: Observing a Phishing Attack Over Years



Phishing attacks use social engineering in an attempt to lure victims to fake websites. The websites could allow the attacker to retrieve sensitive or private information such as usernames, passwords, and credit card details. Attacks of this kind have been around since 1995, evolving in sophistication in order to increase their success rate. Up until now, phishing attacks were generally viewed as isolated events that were dealt with on a case-by-case basis. The dawn of big data analysis in computer security allows us to store data indefinitely and watch the changes and growth of attacks over long periods of time. In 2012, we began tracking a sophisticated phishing campaign that is still going strong.


The Target

Google, one of the largest players in the cloud business, offers dozens of free cloud services: Google Email, Google Drive, Google Docs, Google Analytics, YouTube, etc. To enable easy access across all of these properties, Google built what they call, “One account. All of Google.”   Read More »

Tags: , , , , , , , , ,

Big Data in Security – Part V: Anti-Phishing in the Cloud

TRACIn the last chapter of our five part Big Data in Security series, expert Data Scientists Brennan Evans and Mahdi Namazifar join me to discuss their work on a cloud anti-phishing solution.

Phishing is a well-known historical threat. Essentially, it’s social engineering via email and it continues to be effective and potent. What is TRAC currently doing in this space to protect Cisco customers?

Brennan: One of the ways that we have traditionally confronted this threat is through third-party intelligence in the form of data feeds. The problem is that these social engineering attacks have a high time dependency. If we solely rely on feeds, we risk delivering data to our customers that may be stale so that solution isn’t terribly attractive.  This complicates another issue with common approaches with a lot of the data sources out there:  many attempt to enumerate the solution by listing compromised hosts and  in practice each vendor seems to see just a small slice of the problem space, and as I just said, oftentimes it’s too late.

We have invested a lot of time in looking at how to avoid the problem of essentially being an intelligence redistributor and instead look at the problem firsthand using our own rich data sources -- both external and internal - and really develop a system that is more flexible, timely, and robust in the types of attacks it can address.

Mahdi: In principle, we have designed and built prototypes around Cisco’s next generation phishing detection solution.  To address the requirements for both an effective and efficient phishing detection solution, our design is based on Big Data and machine learning.  The Big Data technology allows us to dig into a tremendous amount of data that we have for this problem and extract predictive signals for the phishing problem. Machine learning algorithms, on the other hand, provide the means for using the predictive signals, captured from historical data, to build mathematical models for predicting the probability of a URL or other content being phishing.


Read More »

Tags: , , , , , , , , , , , ,

Big Data in Security – Part III: Graph Analytics

TRACFollowing part two of our Big Data in Security series on University of California, Berkeley’s AMPLab stack, I caught up with talented data scientists Michael Howe and Preetham Raghunanda to discuss their exciting graph analytics work.

Where did graph databases originate and what problems are they trying to solve?

Michael: Disparate data types have a lot of connections between them and not just the types of connections that have been well represented in relational databases. The actual graph database technology is fairly nascent, really becoming prominent in the last decade. It’s been driven by the cheaper costs of storage and computational capacity and especially the rise of Big Data.

There have been a number of players driving development in this market, specifically research communities and businesses like Google, Facebook, and Twitter. These organizations are looking at large volumes of data with lots of inter-related attributes from multiple sources. They need to be able to view their data in a much cleaner fashion so that the people analyzing it don’t need to have in-depth knowledge of the storage technology or every particular aspect of the data. There are a number of open source and proprietary graph database solutions to address these growing needs and the field continues to grow.

Graph Read More »

Tags: , , , , , , , , , , , , ,

Big Data in Security – Part II: The AMPLab Stack


Following part one of our Big Data in Security series on TRAC tools, I caught up with talented data scientist Mahdi Namazifar to discuss TRAC’s work with the Berkeley AMPLab Big Data stack.

Researchers at University of California, Berkeley AMPLab built this open source Berkeley Data Analytics Stack (BDAS), starting at the bottom what is Mesos?

AMPLab is looking at the big data problem from a slightly different perspective, a novel perspective that includes a number of different components. When you look at the stack at the lowest level, you see Mesos, which is a resource management tool for cluster computing. Suppose you have a cluster that you are using for running Hadoop Map Reduce jobs, MPI jobs, and multi-threaded jobs. Mesos manages the available computing resources and assigns them to different kinds of jobs running on the cluster in an efficient way. In a traditional Hadoop cluster, only one Map-Reduce job is running at any given time and that job blocks all the cluster resources.  Mesos on the other hand, sits on top of a cluster and manages the resources for all the different types of computation that might be running on the cluster. Mesos is similar to Apache YARN, which is another cluster resource management tool. TRAC doesn’t currently use Mesos.


AMPLab Stack

The AMPLab Statck

Read More »

Tags: , , , , , , , , , , , , , , , , , , ,

Big Data in Security – Part I: TRAC Tools

TRACRecently I had an opportunity to sit down with the talented data scientists from Cisco’s Threat Research, Analysis, and Communications (TRAC) team to discuss Big Data security challenges, tools and methodologies. The following is part one of five in this series where Jisheng Wang, John Conley, and Preetham Raghunanda share how TRAC is tackling Big Data.

Given the hype surrounding “Big Data,” what does that term actually mean?

John:  First of all, because of overuse, the “Big Data” term has become almost meaningless. For us and for SIO (Security Intelligence and Operations) it means a combination of infrastructure, tools, and data sources all coming together to make it possible to have unified repositories of data that can address problems that we never thought we could solve before. It really means taking advantage of new technologies, tools, and new ways of thinking about problems.

Big Data

Read More »

Tags: , , , , , , , , , , , , , , , , , , ,