Handling Unstructured Data, Effectively

By CIOReview | Friday, August 19, 2016

The volume, ramification, and diversity of data is growing year after year. Expectedly, by 2017, more than 30 percent of the IT organizations would be seen pursuing advanced analytics and data management to sustain in a competitive environment, reveals an IBM report. Today, majority of organizations contending with data analytics projects commence from the wrong place—some end too early, some take too long but still fall short. Most of the data that mean and matters still rests across disconnected silos in assorted formats. Except for a handful of leaders (like LinkedIn, Netflix, Nordstrom, Target and Verizon), numerous companies are still struggling to close the gap between data collection, insights and action. In simple words, it is unlikely to meet an honest big data analytics.

By and large, organizations deal in two kinds of data i.e. structured data and unstructured data. Some big data analytic tools—primarily those based on Hadoop—are designed from square one to analyze and manage unstructured information. The obsession with analytics application urge to focus more on structured than unstructured data, thereby rationalizing important BI investments.

Exploiting Unstructured Data

For unstructured data, there are several forms—textual unstructured data and non-textual unstructured data, which includes images, sounds, colors, and shapes. Classification of unstructured textual data is a tough row to hoe; it is so pervasive and so ubiquitous. Due to this topsy-turvy unpredictable nature, it has been a herculean task to generalize how to approach unstructured textual data. of spreadsheets.

Organizations often recruit from strong quantitative backgrounds—such as accountants, financial analysts, engineers, economists, actuaries, statisticians—to analyze large complex data sets. Though this method allows expertise to slice and dice any type of data, it is often a strenuous process.  Fortuitously, adding a depth to data analysis, technology has developed the right tools for unstructured data and businesses are free to use and explore this data as per their greatest potential.

“Unstructured data is really coming to the fore of people's minds,” explains Nick Millman, a Senior Director at Accenture Information Management Services. “We are at a tipping point: There is as much value in unstructured data in terms of what customers are thinking on the web and what businesses can derive from other organizations’ data,” he adds.

Gambling with Unstructured Data Management

Data quality, data categorization, combining structured and unstructured data often pose to be a threat for several business. The typical data with financial services companies—including internal/external communication, mobile data, and financial news—are at potential threat as these represent grand share of their total data. They cannot afford to lose any financial data due to lack in governance process in handling unstructured data.

Customizing in-house development to Hadoop, MapReduce or to other open-source tools, can avoid such blocks in the long run. An alternative strategic step would be to implement a data-defined storage architecture that addresses data-management challenges—also get along with new databases and BI or analytics tools. But, above all, an understanding towards the business-centric information is required.

Slow and Stead Wins the Race

Decisions taken by organizations based on structured data use only a very limited portion of the corporate information. For instance, managers who take decisions solely based on current month's revenue could not be effective as a lot of hidden informatory factors were left unnoticed. Those are monthly expenses, the size of the customer base, revenue figures for next year, coming year projections, new product announcements, and many more.

The growth of unstructured data is expected to far outstrip the unprecedented growth of structured data as social media applications like twitter, Blogs, Pin, and many more keep evolving every second. Twitter's Firehose service of raw data, for example, is already in use by companies for everything—from market research and customer service to aligning their supply chain and logistics schemes. Much of that can be done in close to real time.

“The right way to do it is not to start with the technology, but understand what the business is about,” says Stephen Black, Data Management Expert, PA Consulting Group.

IDC predicts that, by 2017, more than 80 percent of data storage capacity would be transported as scale-out solutions to deliver value. That information is expected to be extracted and harnessed by modern big data techniques and technologies. By then, it shall become critical data for big players in the market to consider and leverage data-defined storage solutions offered for managing unstructured data.

“There are all kinds of ways we can take different kinds of data, pull them together, and learn things about what’s effective, much faster,” said Stuart Madnick, Professor of Information Technologies at MIT’s Sloan School of Management.