Google Divulges Information on Borg

By CIOReview | Tuesday, June 16, 2015

FREMONT, CA: Google has published a paper “Large-scale cluster management at Google with Borg” which unveils details on a cluster scheduling technology that was kept under wraps.

Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines. It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation.

Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior. The components of Borg’s architecture includes: cell, cluster, Job, Tasks, Alloc, Borglet, borgmaster and scheduler.

Cell is a collection of machines treated as a unit. Cluster generally contains one large cell and sometimes a few small special purpose cells, some of them being used for testing.  Job is an activity that is executed within the boundaries of a cell. Alloc is a set of machines resources reserved for one or more tasks. Borglet is an agent running on each machine. Borgmaster is a controller process running at cell level and holding state data for all borglets. Scheduler monitors the queue and schedules jobs considering the resources available on individual machines, reports Abel Avram for InfoQ

Google has managed to eliminate scalability limit to Borg’s architecture. A single Borgmaster can manage many thousands of machines in a cell, and several cells have arrival rates above 10 000 tasks per minute. A busy Borgmaster uses 10–14 CPU cores and up to 50 GiB RAM.

One of Borg’s primary goals is to make efficient use of Google’s fleet of machines, which represents a significant financial investment: increasing utilization by a few percentage points can save millions of dollars. This section discusses and evaluates some of the policies and techniques that Borg uses to do so.

Google uses a Linux chroot jail as the primary security isolation mechanism between multiple tasks on the same machine. To allow remote debugging, we used to distribute (and rescind) ssh keys automatically to give a user access to a machine only while it was running tasks for the user.

Virtually all of Google’s cluster workloads have switched to use Borg over the past decade. Google continues to evolve it, and have applied the lessons learnt from it to Kubernetes, concludes the paper.