HPDA, Conjoining Big Data with High Performance Computing (HPC)

By CIOReview | Friday, September 9, 2016

What does artificial limbs, baby food, scratch proof glasses and hand held vacuum cleaners have in common? They are all product offerings that spurred out of NASA space research. Embracing a similar transition, High Performance Computing (HPC), initially confined to government and research intensive domains such as cryptography, weather forecasting, and space exploration; found its use in the enterprise realm. As opposed to modeling or simulation, the financial services industry, in 1980s, became the first buyer of HPC technology for advanced data analytics.

The advent of big data propels the need for data analysis that demands best of super computer powered resources, or in other words, High Performance Data Analysis (HPDA); which promises enterprises the scalability for processing amounts of torrential data overflowing from web applications and APIs, gadgets and the relatively new (and rather virgin) entrants in the IoT ecosystem.

Looking back, gathering analytics meant gaining insights from statistics. CIOs allotted a bunch of desktops installed with spreadsheet applications and dedicated part of workforce to gather insights from the firm’s historic data. This practice then evolved to a server and software level as data sources became increasingly decentralized and their nature, divergent. CIOs realized the possibilities of leveraging business intelligence to stay ahead in an already spiraling and competitive market. Depending upon the size of business and potential gains, at some point, instances where insights from big data are heavily dependent on constraints such as time or complex querying, demanded faster computing. And HPDA was the answer.

Considering a use case for instance, PayPal, by employing HPC was able to detect fraud even before it hit the credit card while it would have taken up to two weeks to detect it with the existent technology that was in use. According to International Data Corporation (IDC), the move has saved PayPal more than $700 million and has also enabled the company to perform predictive fraud analysis. PayPal has since extended HPC use to affinity marketing and management of the company’s general IT infrastructure. IDC forecasts that revenue for HPDA-focused servers will grow robustly (13.3 percent CAGR) to reach $2.7 billion in 2017 while HPDA storage revenue will approach $800 million.

Tools for implementing big data analytics such as the Hadoop, Spark and R have imparted impressive maturity to the realm. However, scalability continues to be a challenge for enterprises when dealing with large volumes of data in range of petabytes to zettabytes. For instance, while MapReduce is generally considered effective for processing unstructured data, the framework is theoretically intended for batch processing thereby making it rather complicated for executing machine learning processes or ad-hoc data exploration. Apache Spark, another alternative as many experts consider, apparently provides better options for analytics significantly based on transactional analysis whose nature of data is vastly structured. HPDA simply blows these niches out of proportion, for the tech’s ultimate aim is to harness every bit of insights irrespective of the type of data. HDPA demands sky high margins of computing power and storage among other resources.

Although public cloud services promise scalability, the time required for moving data coupled with the need for backups often prompts enterprises for an on-premise solution. Not to mention the security constraints and costs involved. Resorting to public cloud to manage HPDA is seen as a widely adopted practice among SMBs and start ups; it is a move that could backfire although offerings in favor of this regard are maturing. Several data storage companies have emerged that use flash, disk and cloud storage to streamline mobility and management of data.

As some analysts suggest, managing the HPDA sphere of big data processing requires a ‘big data workflow’ wherein all data center resources such as public and private cloud, big data, virtual machines, HPC environments are optimized in to an organized workflow. This would also act as a platform to unite the folks with technical background pertains to HPC and analysts or statisticians in relation to HPDA.

From a scalability perspective, firms like EMC, NetApp and IBM claim to provide storage solutions while commercial cloud vendors such as Amazon have also added HPC elements within its offerings infrastructure. Cloud is best suited for development and testing of big data solutions as they compensate the cost for purchasing hardware. Yet, highly parallel HDPA problems burden the cloud initiative and could drive enterprises to go for a dedicated on premise expedition. In such a scenario, a coprocessor such as Xeon Phi from Intel is popular for delivering good throughput and efficiency on-premise.

Investing on HPDA is an expensive affair and like any other disruptive technology, it has still got a long way to self leverage. Time ahead, cost of implementation would continue to reduce just as its penetration increases alongside improved offerings from vendors. Nonetheless, HPDA is the answer to the task of gaining edgy insights from vast and varied data; a task equivalent to spotting a needle in a haystack within moments. A herculean task it may be, but never Sisyphean, and that’s the bet enterprises are willing to make.

Adopting an HPDA strategy is thus a ‘big move’ as far as firms is concerned for it is not an aspect that can be entirely bestowed on vendors without a proper action plan. It is advised to dedicate an entire panel consisting of experts and advisors who would then conduct through research to weigh in objectives, resources, and feasibility. After all, HDPA is just one of many formidable approaches to tap from big data. As of now, hopes are high and so is the stakes.

Quantum Computing, although still in its infancy, and yet peeping on to the enterprise realm; promises a whole new paradigm shift to HPC. D Wave, a company at the forefront of quantum computing boasts the most advanced system of its kind in the world and is currently backed by Google and NASA. They grabbed the spotlight when Google announced that the D-Wave quantum computer can handle some problems in mere seconds which would otherwise have taken 10,000 years for classic computers with a single core. However, it is estimated that it would be years when the technology can be adorned upon enterprises.