A SQL Server DBA's Next-Generation Server & OS Monitoring Wish List
In the 25 years that I have been working with SQL Server, monitoring systems have remained, for the most part, unchanged.
Yes, database monitoring tools have increased in complexity to keep up with the many features added to underlying database platforms. However, they fall short of expectations considering today’s technology. They do not provide administrators like me the capabilities required to holistically manage the increasingly complex environments of today’s enterprises.
A next-generation monitoring tool should not just provide the same standard old dashboard dressed up with new fancy graphs; it should empower me to be actionable. It needs to help me improve the environment and show me the impacts of those actions on the system, and thus, the business.
What the industry needs (for risk of rendering the title of this article bunk) is not another monitoring tool. I have tools that help me monitor and they do a fine job of specifically doing that. I need a tool that will take me into the future, I need a tool that makes me better and faster at what I do. This industry needs a smarter tool.
The Shortcomings of “Monitoring”
Today, I can depend on SQL Server Monitoring tools to help correct database issues about as much as I can depend on my car’s speedometer to help get me out of a speeding ticket.
Each tool presents me metrics, charts, and an overwhelming amount of data - but they do not provide enough benchmark information (ie. the speed does not tell me the speed limit) to take actionable next steps. Unless you already know the speed limit - and lift your foot off the throttle before the officer clocks you at 10 over - you are going to get a ticket.
A Hole of Data, Not a ‘Whole’ of Data
Current monitoring tools require a DBA to dig into a “hole” of graphs, charts, and data, rather than view the “whole” environment. DBAs are left digging deep into a hole of irrelevant and non-actionable information. There is no overview of system performance, or a representation of the workload across the data enterprise. I am left spending hours sifting through all the scattered pieces of information to answer the simple question:“Is my data and environment healthy?”This simple question requires so much more information than a server up/down notification. If a server is online, answering requests but is pegged at 95% CPU and the memory is flushing every 30 seconds, that machine is online, but not healthy. The health of a data platform requires an intelligent analysis of the workload.
Monitoring Alone Does Not Take Action
Traditional data monitoring applications don’t manage your data; they display to you the information they determine as critical. Traditional monitoring does not provide you actionable results without an interpreter. A process is required to provide the intelligent decision making and that will start generating a list of actionable events.
Alerts Are Not Intelligent
It could be argued, a monitoring system acts when an alert is generated. That is a fair argument, however, the alerting should be showing you signs that the target system may experience issues in the future - not when you are having an issue. Again, this could be argued that it already does this; we get storage alerts when we are running low, but we are not getting alerts when the code degrades in performance, and we are not alerted when we are out of compliance with our RPO and RTO requirements. The tools just don’t tell me when behavior in my system is an anomaly. The signs are there, the technology is there, the monitoring process just needs to be observant enough and intelligent enough to take corrective action.
No Focus On Prevention
Most tools are focused on letting you know that your system metrics surpassed what is considered an acceptable value. Using this method to drive the performance of your data is comparable to having your GPS tell you that you missed your turn, and you need to make a legal U-Turn. The goal should be not to miss the turn in the first place. The focus should be on moving from reactive to proactive SQL Server monitoring.
Wish List: SQL Server & Windows OS ‘Smart’ Tool Features
What I need is a tool to show me information about my environment that enables me to make informed, real-time adjustments and suggest ways improve the environment. Adjustments that deliverreal business impact - like reducing licensing and storage costs, preparing systems for a peak capacity event, reducing risk, and preventing downtime.
Here are my ideal “Smart” Server & OS Tool wish list features:
Workload Performance Deltas
Thresholds, baselines, and unique workloads of a system must first be understood to determine what action to take. The same needs to be understood to identify the capacity and health of the system. Monitoring needs to recognize when resource utilization has changed and identify the downstream impacts. Today we perform baseline data collections using the performance monitor before and after changes. The process is a long and tedious one that only works when you do it for every change; and it still requires an interpreter. When the data is being collected and evaluated in a monitoring solution, the human error factor is removed. In addition, automation of these tasks allows your expensive resources to focus on other critical business requirements.
A monitoring tool should be able to identify when your resource capacity is going to end. A good monitoring tool should be able to look at the past utilization and its growth, then predict when the current capacity will no longer be sufficient or when the resource has been over allocated. In like fashion a good monitoring tool will identify resources that are underutilized.
Often, I find resources that are on the border of needing additional resources and being right sized. In this situation a small adjustment in resource utilization could impact big decisions such as not to upgrade compute power. A tool that identifies resource heavy processes can save on expensive resources down the road (do not forget compute resources could require additional licensing) resulting in cost savings on many levels.
Comparing recommended capacity adjustments to the performance deltas, would validate the change and the impact to the data platforms. When resources are removed or reallocated, the cost of maintaining those cores could be recovered simply by understanding the workload and making the proper adjustments.
If I were to ask you when your servers are pushing the heaviest workload, you might know right off the top of your head, or at least think you know. But how do you know when you are pushing your servers the hardest? An even better question is: How do you take some of the peaks out of your workflow and transfer that to some of the valleys, making better use of your resources? Some tasks on your data can wait, some require processing on-demand. A monitoring tool should take advantage of the time frames when on-demand processing is at its lowest.
Generate a Prioritized Worklist
The monitoring tool of the future should not only watch what your system has done, it should identify what you should do. Should I go as far as to say it should make these adjustments for you? If a tool can see that a service pack needs to be installed, or that the memory on a server is unhealthy, why shouldn’t it just provide a list of recommended changes? You shouldn’t have to scour the internet to find the most recent cumulative update or try to determine how much memory your platform should use. The configuration of your hosts and instances should drive these decisions, and the monitoring of the future should let you know when it needs to done, before it needs to be done.
Future monitoring needs to move away from the “does not” and move into the “already did it” category. With advancements in machine learning, artificial intelligence, and rules-based processing, I am confident it’s just a matter of time until a next-generation tool empowers DBAs like me,allowing me to focus on the strategic future- and letting the automation support with the day-to-day low value (but necessary) maintenance task identification and execution.
Chris Shaw has been writing and speaking about SQL Servers for over 25 years at events such as SQL Connections, PASS, SQL Saturdays, and SSWUG Ultimate conferences. Today, Chris serves as a Senior DBA for Fortified Data, a next-generation database managed services provider.