An Overview of Data Integration and Analytics
Data Integration as the name refers, is a process of integrating data originating from different data sources within or outside the organization. Business context on why the data integration is needed is as important, if not more. Hence I find the definition from IBM1 more apt - “Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information.”
Data integration is utilized for many reasons:
a. Creating a view of data which provides value to the company and its stakeholders.
b. Moving or storing data for legal reasons (data retention) or migration like moving in-premise data to cloud through a data integration process.
c. Data warehouses where data from multitude of upstream applications is stores in dimensions which allows downstream access, reporting and analytics.
Data integration should be tackled with a strategic eye as oppose to short term view because once implemented it usually gets surrounded by other business and technical process, which makes changing or updating it later, costly.
Let’s take a simple example of a bank with three customer facing departments – call center, online and on-premise. When customers need help with their banking, they either call the call center, go online to get help or visit one the branches. You are asked to implement a customer feedback program for these three departments. Each day you need to get data on all the customer interactions which happened with the company for that day and send feedback surveys to those customers. When you started you only had data from customer’s call center interactions and in-person interactions in the branch.
Designing a “Data integration” solution at this point, you should also take in to consideration any other touch-points which might come in to the flow in future, like customer online touch-points. Standardize the intake, processing (ETL), cleaning, and enrichment layers so that when you are ready to integrate the online customer interaction data – it’s an agile and cleaner integration.
This is a simpler example; it gets more complex when you have multiple data sources coming from multiple places within and outside the organization. They might begin as simple ETL project of couple of data sources but before you know it’s spaghetti of dozens of data sources supporting critical business reports or functions. Hence even smaller Data integration initiatives should accommodate the immediate business requirements but also need to be looked through the lens of long term data and business strategy.
There are quite a few applications out there from big companies like Oracle and Microsoft which are primarily used on-premise, open source tools like Pentaho, to cloud based like Talend. Their choice is dependent on the size of the organization, integration use case, complexity, the business requirement, security or legal requirements and specific nuances of data sources.
When considering a data integration product – make sure that vendor or implementation team understands your company’s current environment which might be ridden with legacy systems, primitive processes, lack of data roadmap, security needs and organizational constraints. Once they have a good understanding of the opportunities and constraints then the conversation becomes not as much about how good their product is but how can their product be integrated effectively given the opportunities and constraints of the organization.
Well done data integration not only provides ongoing intended business value to the stakeholders but is also operationally efficient, provides proactive alerts when things go bad, is flexible to enhancements and has technical and process connectors to serve the organization in long term.