Data Science Unchained in the Cloud
Ron Menich, VP, Advanced Analytics & Data Science at Catalina Marketing, leads a team of data scientists who implement cloud-based, machine learning software solutions which ingest an ocean of digital and in-store shopper data.
The advent of cloud computing is actively reshaping how advanced decision support applications are developed and deployed.
These systems use core mathematical capabilities -- machine learning, operations research, data science, and management science -- to predict, recommend, optimize, and otherwise support intelligent decision-making. When crafted well such systems can, for example, automatically scan trillions of transactions in order to appropriately personalize content, delighting the shopper and resulting in efficient marketing spend for brands, retailers and their agencies.
Advanced decision support applications often involve heavy offline batch processing combined with real-time interactivity. Requirements are often incompletely known and evolve over time to a much greater extent than those of HR, accounting, or other software systems associated with standardized or well-known use cases.
As you evaluate the potential of the cloud, consider how elastic computation and open-source data science software may enable and efficiently deliver your next-generation decision support applications.
‘Rent’ versus own
In the past, an advanced decision support system typically involved purchasing a large UNIX machine. Companies had to “size the church for Easter Sunday” to create a box big enough to support the largest batch processing or another burdensome job load over the course of a year. However, the average load on the box would be much, much lower, leaving the computer sitting idle or underutilized most of the time. On the other hand, if built too small, the system would be starved for computational resources, sometimes requiring costly software rewrites to operate within the small, already-purchased computer capacity.
With elastic computing, the burdensome mathematical processing components of an advanced decision support system gain access to scalable and essentially unlimited cloud-based computational resources. It can temporarily rent cloud computational resources instead of permanently purchasing them. In the utility computing model, we turn on and off computational resources much like flipping a light switch on and off.
Pay by the ‘sip’
Another dynamic phenomenon is the rise of open-source computing software during recent years. This capability has democratized the execution of algorithms and dramatically reduced the power of closed / proprietary software vendors. These days, any organization with a sophisticated decision support need can gain access to the latest machine learning algorithms on Azure, Amazon, Google or other cloud computation platforms and run those algorithms without having to purchase a permanent software license, nor pay 20%-per-year maintenance fees. Of course, managed service providers charge for these offerings, but in a similar manner to the platform itself: you pay by the sip, rather than paying a huge up-front fee.
With great power comes great responsibility
It is true that the cloud provides unlimited access to computation, but cloud computation costs typically scale with the amount of computational resources required, so it is critical to closely monitor those expenditures. For example, data scientists still need to verify their models on small amounts of data before running huge jobs on Big Data. The organization also needs to properly attribute cloud expenditures to specific products, solutions, or projects that drive those computational needs.
Appropriately managed, intelligent usage of the cloud unchains data scientists to efficiently deliver next-generation decision support capabilities.