Google Divulges Information on Borg
FREMONT, CA: Google has published a paper “Large-scale cluster management at Google with Borg” which unveils details on a cluster scheduling technology that was kept under wraps.
Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines. It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation.
Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior. The components of Borg’s architecture includes: cell, cluster, Job, Tasks, Alloc, Borglet, borgmaster and scheduler.
Cell is a collection of machines treated as a unit. Cluster generally contains one large cell and sometimes a few small special purpose cells, some of them being used for testing. Job is an activity that is executed within the boundaries of a cell. Alloc is a set of machines resources reserved for one or more tasks. Borglet is an agent running on each machine. Borgmaster is a controller process running at cell level and holding state data for all borglets. Scheduler monitors the queue and schedules jobs considering the resources available on individual machines, reports Abel Avram for InfoQ
Google has managed to eliminate scalability limit to Borg’s architecture. A single Borgmaster can manage many thousands of machines in a cell, and several cells have arrival rates above 10 000 tasks per minute. A busy Borgmaster uses 10–14 CPU cores and up to 50 GiB RAM.
One of Borg’s primary goals is to make efficient use of Google’s fleet of machines, which represents a significant financial investment: increasing utilization by a few percentage points can save millions of dollars. This section discusses and evaluates some of the policies and techniques that Borg uses to do so.
Google uses a Linux chroot jail as the primary security isolation mechanism between multiple tasks on the same machine. To allow remote debugging, we used to distribute (and rescind) ssh keys automatically to give a user access to a machine only while it was running tasks for the user.
Virtually all of Google’s cluster workloads have switched to use Borg over the past decade. Google continues to evolve it, and have applied the lessons learnt from it to Kubernetes, concludes the paper.
By James Seevers, CIO & GM, Toyoda Gosei
By Bill Krivoshik, SVP & CIO, Time Warner Inc.
By Gregory Morrison, SVP & CIO, Cox Enterprises
By Alberto Ruocco, CIO, American Electric Power
By Bruce. D. Smith, SVP & CIO, Information Systems, Advocate...
By Adrian Mebane, VP-Global Ethics & Compliance, The Hershey...
By Graham Welch, Director-Cisco Security, Cisco
By Michael Watkins, Senior Product Director, Global Knowledge
By Bernd Schlotter, President of Services, Unify
By Patrick Hale, CIO, VITAS Healthcare
By Steve Bein, VP-GIS, Michael Baker International
By Jason Alan Snyder, CTO, Momentum Worldwide
By Mike Morris, CIO, Legends
By Louis Carr, Jr., CIO, Clark County
By Bill Dow, SVP and General Manager of Business Solutions,...
By Jim Whitehurst, CEO, Red Hat
By Darren Cockrel, CIO, Coyote Logistics, a UPS Company...
By Nathan Johnson, SVP and CIO, Werner Enterprises [NASDAQ:...
By David Tamayo, CIO, DCS Corporation
By Neil Hampshire, CIO, ModusLink Global Solutions, Inc....