“As we know, there are known
knowns.
There are things we know we
know.
We also know there are known
unknowns.
That is to say, we know there
are some things we do not know.
But there are also unknown
unknowns, the ones we don't know we don't know”
Donald Rumsfeld Feb.
12, 2002, Department of Defense news briefing
Thanks Donald. Yes, the unknown
unknowns are scary beasts indeed. Little can be done about them until they
happen. What we can
do, however, is concentrate on those things we do know about, or things which have
a reasonable chance of happening.
As a DBA, I know that bad stuff
happens. I also know that it usually happens at 3am on the morning of my wife’s
birthday. In almost all cases, unexpected problems are due to uncontrolled
change and/or insufficient testing.
In
the last post, we covered the importance of a Service Level Agreement (SLA).
A sure fire way of blowing an SLA is to operate in an environment with insufficient
testing environments and uncontrolled change. In this post, let’s take a look
at 3 worst practices for change control.
Insufficient Testing Environments;
Brent Ozar’s blog from earlier
this year addressed this topic nicely. Whilst the exact shape and number of testing
environments will vary, one should have at least a development and test
environment in which changes are developed and tested before reaching production.
Here is my ideal setup.
The important points to note from
this diagram are that each environment is used for a specific purpose, and that
changes leaving one environment and entering another are signed off by a
particular person or group, thereby imparting ownership of the change. Followed
correctly, such a process significantly reduces the chances of unexpected problems
being introduced into production.
Other than being lazy or indifferent,
one of the common reasons offered by those with insufficient test environments
is the lack of money and/or available servers. Enter virtualization. With the
exception of the Volume Test environment, performance and throughput may not be an issue, therefore, the intermediary test environments (Integration and
User Acceptance in the above diagram) are ideal candidates for virtualization.
The changes that flow from one environment to another are ideally those that emerge from a source control tool as part of a development methodology. Schema comparison tools such as Redgate’s SQLCompare are commonly used as part of this process. From a DBA perspective, another common set of changes are server and database configuration changes, and it’s these changes that can cause a significant headache without an appropriate change log.
No Change Log;
One of the great things about SQL
Server is that the default settings are almost always the best, so much so,
that non-default settings are frequently enabled for the wrong reasons. A
classic mistake made by inexperienced DBA’s is enabling options like “Boost SQL
Server priority”. Another one is to carpet bomb tables with indexes. In most cases,
changes such as these are made in the hope of improving performance, usually
without any record of the change and/or follow up observations of the
performance improvement/degradation. The end result are servers and databases
with massive deviations from best practices, leading to more problems in the
future.
A
crucial component of any successfully managed database environment is the Change
Log. At a very minimum, such a log includes the following details;
- Date and time of the change
- What was changed, ideally with a reference to a script used to make the
change
- Person that executed the change
- Observed performance difference
One of the very first steps in
most performance tuning exercises is looking for non-default settings, or
settings that deviate from best practice. In environments with uncontrolled
changes, such settings are common, and usually accompanied by a lack of
detail surrounding the setting change. Without such detail, changing the setting back
to the default/recommended value is risky, particularly when test environments
are not available to measure the impact of the change. In such environments, testing in production,
is usually the only option, bringing with it the inevitable consequences.
In closing this post, let’s
address one final topic, making multiple changes at once.
Multiple Changes at Once;
An essential component of any performance testing methodology is a
baseline of known performance and a repeatable testing process in order to measure
the impact of a proposed change. A common error is to make multiple changes at
once, the implication being that one or more of these changes may be, at
best, unnecessary, and possibly lead to future problems.
In opening this post, we examined the concept of the known unknowns and
the unknown unknowns. As DBA’s, we know that there are many things that
can go wrong, leading to unknown consequences, for example, we know that a
spike in users during peak times will lead to an unknown performance degradation.
Whilst the consequences of the unknown unknowns are beyond reach, the known
unknowns are very much in play, and test environments, change logs and a
controlled change strategy are crucial components of controlling their impacts
and getting a good night’s sleep.
In the next post, we’ll
address worst practices surrounding storage configuration.