Pivotal Software: Harnessing Hadoop’s True Potential

Rob Mee, CEO
Following the snowballing expansion of the World Wide Web around the early 2000s, Nutch was born as an innovative “web crawler,” which distributes data and computing to multiple nodes for faster processing. In 2006, part of the web crawler project retained the original name while its data distribution component became a mainstream technology. Now known better as Apache Hadoop, its ability to multitask, achieve scalability, and cut costs is drawing many organizations. Nevertheless, many companies that use Hadoop, tend to employ only its data warehouse capabilities—to store data that may not be currently critical, but may be useful in future for retrospect purposes. Pivotal Software was adept in realizing this lost value, and released Pivotal Hadoop Database (HDB) in 2013, after more than a decade’s worth of research on its predecessor, Pivotal Greenplum.

Pivotal HDB is a Hadoop native SQL enterprise analytic engine, in that it transcends its role as a simple data warehouse. Native SQL unleashes the hidden power of Hadoop, by providing useful business insights and predictive analytics to companies that are fast moving and innovative. The high performance architecture of the solution ensures that queries are run almost in real time, and can be scaled to petabyte levels of datasets. Massive parallel processing (MPP) technology used in Pivotal HDB enables the storage of all data with business value in one place. Its architecture supports Python, Java and R programming languages, and data formats like Apache Parquet and HDB binary data files, essentially adhering to varied styles of data storage and data crunching.

For some software-based organizations that may still be struggling with their low computing and data storage abilities, Pivotal HDB offers a scalability that can turn the tables around. “Excellence in software is becoming a core competence, and that’s really daunting for most companies,” says Rob Mee, CEO, Pivotal Software. Dynamic pipelining, concurrent query support, polymorphic storage, and automatic data compression are some of the propositions in Pivotal’s suite that specifically target said challenges.

At the end of the day it’s all about delivering results faster while continuing to focus on improving the customer experience

The deployment of Pivotal HDB can be carried out in two ways—as an IaaS in cloud, or on-premise as a commodity application, essentially “filing out the burrs” of data migration and delivering expedited results.

Such agility in Pivotal’s services was apparent in its support to their client—WGSN Group—during the rapid expansion of their market intelligence solutions for retailers. The new venture, named WGSN InStock, required gathering and processing an ocean of data about products, sales, and pricing—requiring a mammoth versatile database. To satisfy their present and future comprehensive big data needs, Pivotal offered their flagship data warehouse solution— Pivotal Greenplum—and also Pivotal Hadoop Database, which brought the power of SQL-based query handling. Quickly getting InStock up and running was key to WGSN’s competitive strategy, and the tight integration of Pivotal’s solutions was deep-dyed in ensuring a short four week turnaround time.

Recently, Pivotal HDB 2.0 was released as a part of Pivotal’s Big Data suite—employing exploratory analytics for a higher success rate in query searches, saving a lot of time spent on failed queries. This milestone for the company is upheld as a shift of focus from massive parallel processing (MPP) of monolithic databases to an elastic, cloud-scale analytical database, deeply integrated with the Apache Hadoop ecosystem. Pivotal Software stands at the forefront of bringing this enriched experience of data management, as Rob aptly summarizes, “At the end of the day it’s all about delivering results faster while continuing to focus on improving the customer experience.”

Pivotal Software

Palo Alto, CA

Rob Mee, CEO

Provides Pivotal HDB, a Hadoop native SQL database for data science and machine learning workloads