Building the Grand Central Data Station
WP Engine has grown monumentally over the past few years, acquiring new customers, hiring more employees, and strengthening our products, but with growth comes growing pains. To give you some perspective, approximately 3.5 billion people, come to a website hosted by WP Engine every day.
WP Engine’s challenge after flying past the 100,000 customer mark was figuring out how to scale everything—servers, processes, corporate data, and so on— to ensure a quality experience and support our customers expect. Instead of waiting for an issue to arise as a consequence of the 9 billion new records WP Engine pulls in per day, we began architecting for the future to optimize workflow and innovation.
As an engineer it's easy to think, I'll just grab all the data from all our data sources, stick them somewhere, and we can pull reports from there. Unfortunately, it's never that simple. Our customers have certain requirements in how we access their data, regulations like GDPR, and the experience of non-technical employees all come into play. You have to ask yourself How do we streamline all the data from half a million websites and many systems of record into one accessible place? How do we allow our employees the relevant data?
How do we simplify the process to make reports? How do we integrate it with all the systems WP Engine currently runs?
The answers to those questions lie in how we transform the way we ingest, process, clean, analyze and report corporate data. Welcome to “Grand Central Data Station”
We built our “Grand Central Data Station” to transform business logic by cleaning the data and documenting the results. First, we needed to define what our corporate data is, like invoices and user portal data. Second, we needed to understand where our data sources are and define our system of record. Systems of record exist in specific applications that correlate to specific functional areas. Salesforce has your sales data, Zendesk has your support tickets, and NetSuite has your financial information. Third, where can we keep all of this data, and how do we get it there? We chose to use BigQuery as the data warehouse to store everything. Moving all of the information to BigQuery allowed WP Engine to create a huge lake of data. However, having that amount of data in one place presents security issues. Not everyone should have access to all of the information. Big Query allows for the creation of Datasets to organize views and tables. Therefore, access can be given to specific teams, on a need to know basis, giving access and removing it in real time.
Funneling all of the data to BigQuery, in theory, seems straight-forward, but it came with many challenges we had to overcome. Each system of record you are pulling from has different data sources like MySQL, Postgres and or may even be housed behind an API. As a consequence, it is difficult to seamlessly integrate all of the data into one warehouse. So how did we do it? We coded it ourselves! We created a flow of data that talks efficiently with all parts. See below for a chart.
We selected Looker as a BI tool to query, visualize and report. The end result is a Grand Central Data Station that refreshes automatically every hour with a clear audit trail for how all values are calculated with immutable source records. Employees can find hyperlinked, searchable, and accurate documentation on whatever specific topic they need. On top of this, they have access to a data dictionary of the company wide meta-data that is also auditable and documented; e.g. PII, data-quality, last-modified, last-refreshed. Access controls and security and built-in from the start so we don’t need to worry about customer requirements or regulations.
Grand Central Data Station normalized our operational metrics outlook. Different definitions by department don’t create the headaches they once did. Sales now defines a customer the same way as customer success and product. We have unified and centralized data across the board that is helping teams across the company develop GTM plans, new R&D projects, sales opportunities, and marketing.