Following part three of our Big Data in Security series on graph analytics, I’m joined by expert data scientists Dazhuo Li and Jisheng Wang to talk about their work in developing an intelligent anti-spam solution using modern machine learning approaches on Hadoop.
What is ARS and what problem is it trying to solve?
Dazhuo: From a high-level view, Auto Rule Scoring (ARS) is the machine learning system for our anti-spam system. The system receives a lot of email and classifies whether it’s spam or not spam. From a more detailed view, the system has hundreds of millions of sample email messages and each one is tagged with a label. ARS extracts features or rules from these messages, builds a classification model, and predicts whether new messages are spam or not spam. The more variety of spam and ham (non-spam) that we receive the better our system works.
Jisheng: ARS is also a more general large-scale supervised learning use case. Assume you have tens (or hundreds) of thousands of features and hundreds of millions (or even billions) of labeled samples, and you need them to train a classification model which can be used to classify new data in real time.
Read More »
Tags: analytics, ARS, auto rule scoring, Big Data, Cisco, database, email, Hadoop, ham, innovation, Intelligence, offline learning, online learning, operations, security, spam, TRAC
Following part one of our Big Data in Security series on TRAC tools, I caught up with talented data scientist Mahdi Namazifar to discuss TRAC’s work with the Berkeley AMPLab Big Data stack.
Researchers at University of California, Berkeley AMPLab built this open source Berkeley Data Analytics Stack (BDAS), starting at the bottom what is Mesos?
AMPLab is looking at the big data problem from a slightly different perspective, a novel perspective that includes a number of different components. When you look at the stack at the lowest level, you see Mesos, which is a resource management tool for cluster computing. Suppose you have a cluster that you are using for running Hadoop Map Reduce jobs, MPI jobs, and multi-threaded jobs. Mesos manages the available computing resources and assigns them to different kinds of jobs running on the cluster in an efficient way. In a traditional Hadoop cluster, only one Map-Reduce job is running at any given time and that job blocks all the cluster resources. Mesos on the other hand, sits on top of a cluster and manages the resources for all the different types of computation that might be running on the cluster. Mesos is similar to Apache YARN, which is another cluster resource management tool. TRAC doesn’t currently use Mesos.
The AMPLab Statck
Read More »
Tags: AMPLab, analytics, BDAS, Big Data, BlinkDB, Cisco, custom, database, Hadoop, innovation, mapreduce, Mesos, NoSQL, Scala, security, Shark, Spark, Stack, TRAC, TRAC Big Data Analysis
Recently I had an opportunity to sit down with the talented data scientists from Cisco’s Threat Research, Analysis, and Communications (TRAC) team to discuss Big Data security challenges, tools and methodologies. The following is part one of five in this series where Jisheng Wang, John Conley, and Preetham Raghunanda share how TRAC is tackling Big Data.
Given the hype surrounding “Big Data,” what does that term actually mean?
John: First of all, because of overuse, the “Big Data” term has become almost meaningless. For us and for SIO (Security Intelligence and Operations) it means a combination of infrastructure, tools, and data sources all coming together to make it possible to have unified repositories of data that can address problems that we never thought we could solve before. It really means taking advantage of new technologies, tools, and new ways of thinking about problems.
Read More »
Tags: analytics, API, Big Data, Cisco, database, Hadoop, HDFS, innovation, Intelligence, java, mapreduce, NoSQL, operations, security, Shark, Spark, SQL, telemetry, TRAC, TRAC Big Data Analysis
With enough hype to rival even the most popular of Superbowl’s, Big Data experts will converge on New York City in just a couple weeks! But big data has good reason for all the hype as businesses continue to find new ways to leverage the insights derived from vast data pools that are continuing to grow at an exponential rate. A big reason for this is the ability to leverage Hadoop with the Hadoop Distributed File System and MapReduce functionality to analyze the data very quickly and provide incredibly fast queries that, although not even possible previously, can now be accomplished in minutes or less. We’ve only just begun to scratch the surface in terms of the financial returns made around Hadoop and the infrastructure to support Hadoop deployments but one thing we do know, it’s going to be big and it will continue to get bigger!
So how does Cisco fit into this picture?
Cisco is partnering with leading software providers to offer a comprehensive infrastructure and management solution to support customer big data initiatives including Hadoop, NoSQL and Massive Parallel Processing (MPP) analytics. Leveraging the advantages of fabric computing, the Cisco UCS Common Platform Architecture (CPA) delivers exceptional performance, capacity, management simplicity, and scale to help customers derive value more quickly and with less management overhead for the most challenging big data deployments.
Cisco UCS Common Platform Architecture for big data enables rapid deployment, predictable performance, and massive scale without the need for complex layers of switching infrastructure. In addition, the architecture offers unique data and management integration with enterprise applications hosted on Cisco UCS. This allows big data and enterprise applications to co-exist within a single management domain that simplifies data movement between applications and eliminates the need for unique technology silos in the data center. You can also check out my previous blog, Top Three Reasons Why Cisco UCS is a Better Platform for Big Data, to get an idea of what we’ll be sharing at the show.
Have you considered Cisco UCS for your Big Data projects? I’d like to invite you to come and hear more in a couple weeks at Strata Hadoop World in New York City. We’ll have a number of demos and experts on hand to answer all of your questions.
In addition, Cisco and Cloudera are teaming up to offer you a chance to win some exciting prizes by joining our demo crawl program. Stop by either the Cisco booth (#3) or the Cloudera booth (#403) to learn more.
Stop by and say hello and let me know if you have any comments or questions, or via twitter at @CicconeScott.
Tags: Big Data, blade server, Blade Servers, Cisco UCS, Cisco Unified Computing System, Cisco Unified Data Center, Cisco Unified Fabric, Cisco Unified Management, Cloudera, Hadoop, Hortonworks, Intel, MapR, rack server, UCS Manager, UCS service profiles
4 years ago, Cisco introduced a revolutionary server platform called Unified Computing System or UCS. Since that time it has become the number 2 blade server in the world. At the same time UCS was introduced, Cisco also elevated their relationship with SAP. Cisco was one of the first hardware platform certified for SAP HANA. Cisco has also announced and is selling SAP solutions on the UCS platform in a variety of solutions, those being SAP on Vblock, SAP on FlexPod, SAP ERP on UCS, Precision Marketing for retail establishments on UCS, Suite on HANA on UCS, Sybase ASE on UCS. Cisco is also working very closely with SAP on their cloud solutions and how these solutions will benefit joint customers.
So what is Cisco doing at SAP TechEd in Las Vegas from Oct 21 through Oct 25.
Palazzo Congress Center, Las Vegas
Come visit Cisco at our booth #1000 and see firsthand how
Cisco delivers a complete compute platform for SAP Applications, including SAP HANA. In booth demonstrations will include:
SAP on FlexPod
SAP and Hadoop
SAP on Vblock
SAP IT Process Automation by Cisco
Attend Cisco Speaking Sessions
Join us at SAP TechEd for Cisco speaking and expert
networking sessions. With topics including: What Is
The Real Value of HANA, SAP and Hadoop, UCS For SAP HANA, and more. For more information on speaking sessions, click here. We look forward to seeing
you at our sessions.
Meet with Cisco Experts at Booth 1000
Cisco executives and subject matter experts will be available to meet with you at SAP TechEd 2013. To schedule a meeting, please contact your Cisco representative.
Connect at our Customer Events
Make sure to join us for some fun as well. We will be hosting a Customer Reception with some of our partners, for additional information, please contact you local Cisco Account Executive.
We will also be hosting an Exclusive Customer Dinner at TAO. For details, please contact you local Cisco Account Executive.
For more information be sure to visit,
Cisco at SAP TechEd Las Vegas.
Join the Conversation on Cisco at SAP TechEd Las Vegas
Cisco UCS with
Intel® Xeon® processors
© 2013 Cisco and/or its affiliates. All rights reserved.
Intel, the Intel logo, Xeon, and Xeon Inside are trademarks or registered trademarks of Intel Corporation
in the U.S. and/or other countries
Tags: Cisco, FlexPod, Hadoop, HANA, IT Process Automation, ITPA, SAP, SAP TechED, UCS, Vblock, VCE