Every day, security threats continue to evolve, as cyber attackers continue to exploit gaps in basic security controls. In fact, the federal government alone has experienced a 680% increase in cyber security breaches in the past six years, and cybersecurity attacks against the US average 117 per day. Globally, the estimated annual cost of cybercrime is over $100 billion. Often, even when security breaches are identified, it can be extremely difficult to figure how they happened or who is responsible.
One company working hard to prevent these threats is Solutionary, a managed security services provider (MSSP) that actively monitors their customers’ technology systems in order to identify and thwart security events before any negative impacts occur.
In order to provide real-time analytics of client traffic and user activity, Solutionary, a wholly owned subsidiary of NTT Group, developed a patented Solutionary ActiveGuard® Security and Compliance Platform which correlates data across global threats and trends in order to quickly identify security alerts and provide clients with actionable alerts.
The patented, cloud-based ActiveGuard® Security and Compliance Platform is the technology behind Solutionary Managed Security Services
In order to keep up with growing data volumes, the need for fast security analytics, and their expanding client base, Solutionary needed to find a way to quickly scale their infrastructure, as their traditional server infrastructure was not able to easily scale and support in-depth analysis. Their challenge was to figure out how to:
1) Increase their data analytics capabilities and improve their clients’ security
2) Cost-effectively scale as their clients/data volume grows
When a security threat occurred in the past, the legacy systems could only be used to analyze log data; they couldn’t see the big picture. Thus, when an event happened, it would sometimes take weeks of forensics work to figure out what had occurred. In order to meet these challenges, Solutionary turned to the MapR Distribution for Hadoop running on the Cisco Unified Computing System™. By using Hadoop, Solutionary was able to smoothly analyze both structured and unstructured data on a single data infrastructure, instead of relying on a costly traditional database solution that couldn’t pull in both structured and unstructured data into a single platform for analysis.
Cisco UCS Common Platform Architecture for Big Data
Specifically, the Cisco/MapR environment consists of two MapR clusters of 16 Cisco UCS C240 M3 Rack Servers. Solutionary uses the Cisco UCS Manager to provision and control their servers and network resources, while the Cisco UCS 6200 Series Fabric Interconnects provide high-bandwidth connections to servers, and act as centralized management points for the Cisco infrastructure, eliminating the need to manage each element in the environment separately. Because of the environment’s high scalability, it’s easy for the fabric interconnects to support the large number of nodes needed for MapR clusters. Scalability is improved even further by using the Cisco UCS 2200 Series Fabric Extenders to extend the network into each rack.
Cisco UCS Components
With MapR and the Cisco UCS CPA for Big Data environment, Solutionary can now access a much greater amount of data analysis and contextual data, giving them a more informed picture of behavior patterns, anomalous activities, and attack indicators. By quickly identifying global patterns, Solutionary can identify new security threats and put them into context for their clients.
Let me know if you have any comments or questions, or via twitter at @CicconeScott.
Historical data is now an essential tool for businesses as they struggle to meet increasingly stringent regulatory requirements, manage risk and perform predictive analytics that help improve business outcomes. While recent data is readily accessible in operational systems and some summarized historical data available in the data warehouse, the traditional practice of archiving older, detail-level data on tape makes analysis of that data challenging, if not impossible.
Active Archiving Uses Hadoop Instead of Tape
What if the historical data on tape was loaded into a similar low cost, yet accessible, storage option, such as Hadoop? And then data virtualization applied to access and combine this data along with the operational and data warehouse data, in essence intelligently partitioning data access across hot, warm and cold storage options. Would it work?
Yes it would! And in fact does every day at one of our largest global banking customers. Here’s how:
Adding Historical Data Reduces Risk
The bank uses complex analytics to measure risk exposure in their fixed income trading business by industry, region, credit rating and other parameters. To reduce risk, while making more profitable credit and bond derivative trading decisions, the bank wanted to identify risk trends using five years of fixed income market data rather than the one month (400 million records) they currently stored on line. This longer time frame would allow them to better evaluate trends, and use that information to build a solid foundation for smarter, lower-risk trading decisions.
As a first step, the bank installed Hadoop and loaded five years of historical data that had previously been archived using tape. Next they installed Cisco Data Virtualization to integrate the data sets, providing a common SQL access approach that made it easy for the analysts to integrate the data. Third the analysts extended their risk management analytics to cover five years. Up and running in just a few months, the bank was able to use this long term data to better manage fixed income trading risk.
Big Data remains one of the hottest topics in the industry due to the actual dollar value that businesses are deriving from making sense from tons of structured and unstructured data. Virtually every field is leveraging a data-driven strategy as people, process, data and things are increasing being connected (Internet of Everything). New tools and techniques are being developed that can mine vast stores of data to inform decision making in ways that were previously unimagined. The fact that we can derive more knowledge by joining related information and recognizing correlations can inform and enrich numerous aspects of every day life. There’s a good reason why Big Data is so hot!
This year at Hadoop Summit, Cisco invites you to learn how to unlock the value of Big Data. Unprecedented data creation opens the door to responsive applications and emerging analytics techniques and businesses need a better way to analyze data. Cisco will be showcasing Infrastructure Innovations from both Cisco Unified Computing System (UCS) and Cisco Applications Centric Infrastructure (ACI). Cisco’s solution for deploying big data applications can help customers make informed decisions, act quickly, and achieve better business outcomes.
Cisco is partnering with leading software providers to offer a comprehensive infrastructure and management solution, based on Cisco UCS, to support our customers’ big data initiatives. Taking advantage of Cisco UCS’s Fabric based infrastructure, Cisco can apply significant advantage to big data workloads.
Undoubtedly Big Data is becoming an integral part of enterprise IT ecosystem across major industry verticals, and Apache Hadoop is emerging almost synonymous with it as the as the foundation of the next generation data management platform. Sometimes referred to as Data Lake this platform serves as the primary landing zone for data from across a wide variety of data sources. Traditional and several new application software vendors have been building the plumbing -- in software terms data connectors and data movers -- to extract data from it for further processing. New to Apache Hadoop is YARN which is pretty much an operating system for Big Data enabling multiple workloads -- batch, interactive, streaming, and real-time -- all coexisting on a cluster.
The Hortonworks Data Platform combines the most useful and stable versions of Apache Hadoop and its related projects into a single tested and certified package. Cisco has been partnering with HortonWorks to provide an industry leading platform for enterprise Hadoop deployments. The Cisco UCS solution for Hortonworks Data Platform is based on the Cisco UCS Common Platform Architecture Version 2 for Big Data – a popular platform for Data Lakes widely adopted across major industry verticals, featuring single connect, unified management, advanced monitoring capabilities, seamless management integration and data integration (plumbing) capabilities with other enterprise application systems based on Oracle, Microsoft, SAS, SAP and others.
We are excited to see several joint wins with Hortonworks in the service provider, insurance, retail, healthcare and other sectors. The joint solution is available in three reference architectures, Performance-Capacity Balanced, Capacity Optimized and Capacity Optimized with Flash – all support up to 10 racks at 16 servers each without additional switches. Scaling beyond 10 racks (160 servers) can be implemented by interconnecting domains using Cisco Nexus 6000/7000/9000 series switches, scalable to thousands of servers and to hundreds of petabytes storage, and managed from a single pane using the Cisco UCS Central.
New to this partnership is Hortonworks Data Platform 2.1 which includes Apache Hive 13 which significantly faster than previous generation Hive 12. We have jointly conducted extensive performance benchmarking using 20 queries derived from TPC-DS Benchmark – an industry standard benchmark for Decision Support Systems from the Transaction Processing Performance Council (TPC). The tests were conducted on a 16 node Cisco UCS CPA v2 Performance-Capacity Balanced cluster using a 30TB dataset. We have observed about 300% performance acceleration for some queries with Hive 13 compared to Hive 12. See Figure 1.
Additional performance are improvements expected with the GA release. What does this mean? (i) First of all, Hive brings SQL like abilities – SQL being the most common and expressive language for analytics -- to petabyte scale datasets – in an economical manner (ii) Hadoop becomes friendlier for SQL developers and SQL based business analytics platforms (iii) Such performance improvements (from Hive 12 to 13) makes migrations from proprietary systems to Hadoop even more compelling. More coming. Stay tuned !
Figure 1:Hive 13 vs. Hive 12
Disclaimer: The queries listed here is derived from the TPC-DS Benchmark. These results cannot be compared with TPC-DS Benchmark results. For more information visit www.tpc.org.