CIO Review >> Magazine >> May - 2013 issue

MapR Technologies: Extending the Promise of Hadoop

By

Tuesday, May 7, 2013

John Schroeder With data growth exploding and new unstructured sources of data expanding, a new approach is required to handle the volume, variety and velocity of this growing data. Apache Hadoop, which was inspired by Google's MapReduce and Google File System (GFS) papers, exploits commodity servers and increasingly less expensive compute, network and storage. Hadoop is a software framework that enables applications to work with thousands of nodes and petabytes of data. An important measure for technology growth is job creation. Hadoop job growth dwarfs other Big Data technologies and is in the top 10 overall according to Indeed.com.

Founded in 2009, MapR delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of mission-critical and real-time production uses. Our innovations transform Hadoop into a reliable compute and dependable data store, with world-record performance. Now, through the efforts of the community and through the efforts of MapR, mission-critical use cases can be supported on MapR with full data availability and protection.

One of the reasons Hadoop is rapidly gaining in popularity, is that we have transformed it from being limited to batch, to support real-time analysis. MapR has also expanded the supported programming and data access interfaces. MapR added a POSIX compliant storage layer, along with the ability to access Hadoop using file-based interfaces. We also added JBDC and OBDC drivers, so you can access data in Hadoop from standard database tools. Deployments for Hadoop, continue to grow across financial services, retail, media, healthcare, manufacturing, telecommunications, Web 2.0 companies and government organizations.

MapR Technologies offers three versions of its complete distribution for Hadoop: The M3 edition which is free with performance and NFS access advantage, the M5 Edition which adds high availability, business continuity, and 24 X 7 support on a subscription basis, and the M7 Edition which makes HBase easy, dependable and fast. M7 not only delivers enterprise-grade features such as instant recovery, snapshots and mirroring; and also provides consistent performance while eliminating architectural complexity.

Delivering on the Promise of Hadoop

Hadoop has wide applicability across industries and applications. For example, financial services companies are using Hadoop for scalable fraud detection and analysis to more easily detect fraud or loss prevention and to mitigate risk of financial positions. In media and entertainment, applications include targeting marketing applications and ad platforms that can store and analyze large data sets consisting of billions of objects.

Manufacturing firms are using Hadoop to perform equipment failure analysis and supply chain optimization. Healthcare companies are able to search and analyze disparate data sources such as patient populations, treatment protocols, and clinical outcomes to accelerate discovery and insight.

Ancestry.com, the world's largest online resource for family history with more than 2.5 million paying subscribers across all family history sites, is a specific example of a company that has leveraged MapR to benefit from Big Data. Ancestry.com deals with more than ten billion records that are part of a four petabyte data store. With more than 40,000 record collections in their data store they mine this data using patterns in search behavior to help their millions of registered users in a more relevant way. MapR was chosen by Ancestry.com over other distributions because the superior high availability and data protection provided by MapR.

As the use of Hadoop continues to grow, we believe ease of use, availability, dependability and performance will continue to be key requirements for Hadoop adoption and success.