When Throughput Matters
Supercomputers are the thorough bred race horses of the computing world, empowering the field of computational science known as high performance computing. HPC is used to tackle problems across a wide range of computationally intensive scientific disciplines, including molecular modeling, weather forecasting and climate research, oil and gas exploration, and physical simulation. However, not all computer-intensive tasks are alike or require the same finely tuned, cutting-edge infrastructure. High throughput computing (HTC), for example, is a computing “paradigm that focuses on the efficient execution of a large number of loosely coupled tasks.” Here we will discuss briefly how HTC contrasts with its better-known cousin and discuss its applicability to disciplines such as computational finance.
Twice a year, researchers at the National Energy Research Scientific Computing Center compile the TOP500 list, which ranks the most powerful publicly known computer systems in the world. Performance in HPC is measured in terms of floating-point operations per second (FLOPS). The current holder of the No. 1 slot in the TOP500, the Oak Ridge National Laboratory’s Summit supercomputer, scores a LINPACK benchmark rating of 122.3 quadrillion FLOPS, or122.3 PetaFLOPs. In contrast to HPC workloads, which require large amounts of computing power for relatively short periods of time, HTC workloads may run for many months or even longer. In HTC, a typical metric for measuring performance is not FLOPs but the number of jobs completed in a month or a year.
Supercomputers are the thorough bred race horses of the computing world, empowering the field of computational science known as high performance computing
In HPC, computational problems are mapped to super computers by decomposing them across the many processors in the cluster. (The Summit super computer, for example, is equipped with more than 202,000 CPU cores and 27,000 GPUs.) Often the mapping is achieved by dividing up the space or time coordinates of the system being modeled and calculating each sub-region or time interval simultaneously. Most scientific problems require some degree of interaction among sub-regions or time intervals, with the result that nodes within the cluster must be able to communicate with one another. This technique is known as message passing; today the majority of HPC workloads employ a standard that was first codified in 1993 and is known as Message Passing Interface (MPI). Unlike HPC simulations, which must execute within a single supercomputing site, the component tasks in HTCare embarrassingly parallel and can be scheduled across administrative boundaries.
The European Organization for Nuclear Research, better known as CERN, operates the world’s largest computing grid. The Geneva-based organization employs more than 800,000 computer cores, distributed across 170 data centers located in 42 countries to continuously process the output of the Large Hadron Collider, the world’s biggest and most powerful particle accelerator. CERN’s batch computing service currently comprises about 190,000 CPU cores. HTC is also used in the development of medical drugs, bioinformatics, image and text analysis, and computer animation. A single movie produced by DreamWorks Animation may occupy a render farm of 20,000 CPU cores for six months straight.
HTC also happens to be an important tool within quantitative finance. Backtesting of financial models is the lifeblood of quantitative asset management firms and similar businesses. A model represents a view on how the financial markets operate at some level and has dimensions of both an instrument (stock ticker, futures contract, FX spot price, etc.) and a time period. Backtesting, or simulation, as it is also known, involves stepping through a time series of historical instrument prices to identify whether the model is valid and has potential to make money if it is traded in the future. Because the process of evaluating financial models can be carried out in an independent fashion, the models map very well to HTC systems.
For every model that ends up being incorporated into a live trading strategy, thousands of financial simulations likely will have been previously evaluated as part of the research process. This is especially true when financial research is driven by machine learning and search methodologies, which seek to iteratively improve the performance of a precursor model. By way of example, at WorldQuant we execute billions of simulations per month. The models’ dependence on common, core datasets means that their placement within an HTC environment can be fruitfully optimized for data locality—placing the job where the data is rather than the converse. In addition, the abundance of hot data (for example, equity price and volume) means that multilevel caching approaches can be enormously effective.
The compute nodes that make up HTC systems are more akin to carthorses than to the Thoroughbreds of the HPC world. Because the jobs are independent of one another, esoteric low-latency network interconnects are not required and HTC systems can be built using commodity off-the-shelf components. If a compute node fails, the jobs it was executing can simply be rescheduled elsewhere. Assuming the existence of adequately performing storage systems, the use of cloud computing is possible, even desirable—its elastic nature and steeply discounted spot or preemptible resources are a good match for HTC, if not for high performance computing.