Cloudera Launches Project Ibis to Enable Data Scientists to Leverage the Power of Hadoop

By CIOReview | Wednesday, July 22, 2015

FREMONT, CA:   Cloudera, a provider of enterprise data management, announced Ibis, an open source project that makes use of new APIs to enable Python developers to execute applications on Hadoop and enable data scientists to take advantage of Big Data and Hadoop for data analysis Cloudera has also announced Wrangle, a single-day, single-track industry event that will dive into the principles, practice, and application of data science from the startup to the enterprise.

Cloudera recognized the importance of the Python, a programming language, in modern data engineering and data science. Python development has been confined to local data processing and smaller data sets, requiring data scientists to make many compromises when attempting to work with big data. Using Ibis, a new open source data analysis framework, Python users will finally be able to process data at scale without compromising user experience or performance.

The initial version of Ibis provides an end-to-end Python experience with comprehensive support for the built-in analytic capabilities in Impala for simplified data warehousing, data wrangling, and analytics. Upcoming versions will allow users to leverage the full range of Python packages as well as express efficient custom logic using Python. By integrating with Impala, the leading MPP database engine for Hadoop, Ibis can achieve the interactive performance and scalability necessary for big data. Impala can eliminate the need to move data between Hadoop and other platforms. While Ibis takes the concept a step further by exposing Impala via an API that developer can call directly, thereby eliminating the need to invest millions of dollars.

“Hadoop has evolved dramatically over the last decade, from a batch processing tool to an entire ecosystem that powers most of today’s information architecture as well as traditional BI workloads. We want to build on this momentum and make Hadoop’s infrastructure more accessible to the data science community. We’re doing that by bringing Python more fully into the ecosystem and focusing on the real-world, practical applications of data science,” said Wes McKinney, a software engineer at Cloudera and the creator of Python pandas.

Ibis is an Apache-licensed project and open to contributions from the open source community.