Server Throughput: The Key to Faster Big Data Processing

By CIOReview | Tuesday, July 19, 2016

From time immemorial, the one element that humans are exploring extensively is ‘time’ itself. All those failed experiments on time travel which have been carried out through centuries tells us one thing and one thing only, time is invaluable. It is no coincidence that some anonymous person long ago said ‘time and tide waits for none’, even when devices to measure time did not existed. If time was of such great importance back then, then it is probably naivety to think one needs to be explained about time’s importance now.

Technology, obviously, has developed to such an extent where, every second lost converts into massive loss in capital and that today, is unacceptable. Big data has taken the world by storm and processing it is not the same as crunching the data in an archaic fashion. General purpose servers cannot be leveraged for a function such as this. Even though it is possible, conventional servers are rather very slow and time consuming, and as discussed, that is undesirable. So the dilemma arises – What do we do?

To answer that question, we look beyond just time, to something called ‘throughput’. In the context of communication networks, which is the topic we are going to focus on, server throughput is the rate of successful messages delivered over a communication channel i.e., the data packets that are delivered per second from one location to another.

The limitations of analog physical medium, with restricted availability of processing power of the system components and end-user behavior are few of the factors that could potentially affect or reduce the throughput of a system. When various protocol overheads are taken into account, useful rate of the transferred data can be significantly lower than the maximum achievable throughput.

In order to overcome this issue and ramp up the throughput, hardware vendors have started designing systems which feed data to the CPU through special channels rather than the traditional method where the server designs used to take the data from an I/O subsystem and place it in the memory for processing. By moving data directly to the waiting CPUs, applications do not have to index, lock or otherwise manage data — saving processing time and obtaining results rather quickly.

One such example of these special channels is the IBM Data Engine for NoSQL — Power Systems Edition. It is an integrated platform for large and fast growing NoSQL data stores. It delivers high speed access to both RAM and Flash storage resulting in lower cost, and higher workload density than a standard RAM-based system. IBM created the Coherent Accelerator Processor Interface (CAPI) pathway into its POWER8 microprocessor architecture. CAPI's high-speed channel interface is accessible by I/O devices and other CPU types. CAPI eliminates the movement of large volumes of data through a memory subsystem. It also prevents the memory and I/O subsystems from swapping in and out data — as it directly processes from storage to the CPU. Speeding up processing with CAPI significantly increases workload throughput per server.

Apart from such platforms, one can also consider using multiple CPU’s and/or Field Programmable Gate Array (FPGA) accelerators. FPGAs have the ability to be reprogrammed at "run time", leading to reconfigurable computing or reconfigurable systems – CPUs that adapt themselves to suit the necessary task. Additionally, there are software-configurable microprocessors that adopt a hybrid approach by providing an array of processor cores and FPGA-like programmable cores on the same chip. Big data, Hadoop, machine learning and bioinformatics applications benefit from an FPGA accelerator with a POWER8 processor, which exponentially increases the throughput.

Using multiple CPUs promises process specific types of work, where there are multiple servers. Each server carries out the task of processing a particular type of application. Many large enterprises move data from centralized servers to distributed servers or data warehouses for faster processing. They Extract, Transform and Load (ETL) the data packets onto target data warehouse systems. By using multiple specialized servers, combined with FPGAs, the ETL process can be accelerated, since the throughput remains higher.

The applications that exploit these new accelerator architectures need to process data rapidly, and sometimes in parallel. Serial applications like email and messaging will not be benefitted from accelerator based server architecture. Hence, the application hardware should be chosen very carefully. The one thing that should always be kept in mind is that - server utilization should be pushed for maximum capacity at the highest throughput. So regardless of every possible method discussed; the key to high throughput lies in using all the servers and devices involved consistently and at their maximum threshold, which in turn yields greater results.