In my experience, indexing problems can typically be traced back to one of two sources.
The first is developers who don't create indexes either because they don't know what they are, assume the DBA will take care of it in production, or assume they're not needed thinking that production performance will be comparable to a small development database.
The common outcome from this scenario is gradually declining production performance, often leading to false assertions about the adequacy of the hardware and/or SQL Server itself.
The second source is DBAs without index analysis knowledge, who are unsure of how to go about the process of determining which indexes are missing, which are not used, and which require maintenance.
A common outcome from this second group are databases that look like they've been carpet bombed with indexes by someone hoping that performance will eventually improve with the addition of more and more indexes.
So for the first of our three indexing worst practices, let’s look at situations in which there are either too many or not enough indexes.
Too many/not enough
Not having the required indexes will cause SQL Server to read more data than necessary from disk, leading to more disk I/O and therefore less available memory - a classic double whammy.
Almost as bad, having too many indexes leads to more overhead for data modification statements – SQL Server has to maintain the indexes even though they’re never actually used.
The first thing to say about these two situations is that with an adequate load testing environment, this should never really occur in the first place. If the database is stressed with the expected level of production activity, most of these problems would be identified before they become apparent in production.
Having said that, load testing is always a bit of a guessing game as to what the expected production load will be, and depending on the load testing tools and techniques, may not produce an accurate load profile anyway.
So we need to ensure that we’re analysing index usage and fragmentation on a regular basis, and in this regard, both SQL Server 2005 and 2008 have made this process much easier.
Rather than regurgitate Books Online, I’ll point you in the direction of this blog post from Bart Duncan on discovering missing indexes and this blog post from Jason Massie on analysing index usage.
Updating Statistics after Index Rebuild
So now for our second worst practice, and I have to say one of my favourites, and that’s updating statistics after an index rebuild.
SQL Server keeps statistics on both indexes and columns which help it decide how it will process each query. Because certain index lookups require two actions for each matching row, that is, look up the index, and then look up the table entry, for queries that will return a lot of rows, it may be quicker to simply scan the table rather than use the index.
Table scans use sequential I/O, index lookups use random I/O. Because moving the disk heads around during random I/O is much slower than scanning a sequential chunk of data, it’s important that SQL Server only use random I/O index lookups for queries that return a low percentage of rows compared to the total table size.
In order for SQL Server to determine the estimated number of rows to be returned from a query, it keeps statistics on both indexes and columns. For example, it will store the fact that a given table contains a million people with the surname “Smith” and only two with the surname “Zatorsky”, and that gives it the best chance of determining the best query plan to use for a surname based lookup, i.e.; table scan or index lookup.
So getting backup to the worst practice, I refer your attention to the following screen shot of a SQL Server 2008 Maintenance Plan.

You’ll notice on the left of screen, we have 2 steps, the first to rebuild indexes, and the second to update statistics. The properties window on the right side of screen contains the default values for the update statistics task.
One of the things that happens when rebuilding an index is that the statistics for that index are updated “in full”, that is, the statistics will be perfect, given that SQL Server has to read every single row when rebuilding the index.
So after rebuilding it, there’s no need to update the index statistics. If you do, you’re effectively building the index statistics twice, and for large indexes, that’s an unnecessary and costly overhead.
You’ll also notice in the scan type section, the choice between “Full scan” and “Sample by”. The "Sample by" option is designed for speed, whereby it will sample a certain percentage of rows rather than the full set – the idea being a trade off between build time and statistics accuracy.
In earlier versions of SQL Server, this was a particular problem, as the "Full scan" option at the bottom there, wasn’t available in the maintenance plans, so people would often rebuild an index, and then proceed to replace a full set of statistics with a sample set. In 2008, the default option, as you can see, is a full scan, which means that rather than replace perfect stats with sample stats, they’re rebuilt twice – still bad, but probably better than replacing good stats with less accurate ones.
So, the recommendation here is that after rebuilding indexes, rather than update "All existing statistics", we update "Column statistics only", shown above by the red bar around the option. Column statistics are not touched by the index rebuild process, so using this option limits the work done by SQL Server, and in the case of very large databases, this option can save a massive amount of time, therefore shortening mainatenance windows.
Inappropriate Index Maintenance
As equally important as having the right amount of indexes, is making sure that indexes are appropriately maintained. Like any other disk object, indexes become fragmented over time through the normal course of inserts, deletes and updates. In addressing this, indexes in SQL Server can be either rebuilt or defragmented - now referred to as “reorganized”.
Compared to defragging or reorganizing an index, rebuilding will achieve the best result from a fragmentation perspective, however, there are a couple of downsides.
Firstly, unless you have the Enterprise edition of SQL Server, rebuilding an index is an offline operation, meaning that the index is unavailable during the rebuild.
Secondly, rebuilding an index is a fully logged operation, meaning that the transaction log will grow much more when compared to reorganizing, and for installations which use Log Shipping or Database Mirroring, this can have a significant impact, particularly when using high safety mode mirroring or when disk space is a problem.
So when I say inappropriate index maintenance, what I’m talking about is really one of two things; either not maintaining them at all, or choosing rebuild over reorganize or vice versa.
The classic rule of thumb is that for indexes with a 5 – 30 % fragmentation level, we defragment / reorganize, and indexes with more than 30% we rebuild, assuming we can either cope with the outage, or we have the Enterprise edition of SQL Server. This technique usually delivers the best bang for the buck – a minimum amount of disk usage and overhead for the maximum amount of performance gain.
From an fragmentation analysis perspective, sys.dm_db_index_physical_stats is a crucial component of any index maintenance plan. One of the great things we can do with this DMV is use it as part of an automated script, and Books Online actually provides the code to do exactly that.
Included below is a screen shot from the “sys.dm_db_index_physical_stats” article in Books Online. It shows some of the code from a cursor that loops through each index in the DMV looking at the fragmentation level, and depending on the percentage, either rebuilds it or reorganizes it.
All of this code is freely available in Books Online, and provides a much better alternative than a simple rebuild of all indexes every night, or worse, no maintenance at all. For more boutique and full featured solutions, be sure to check out Ola Hallengren's solution and Michelle Ufford's solution.
In the next post in this series, we'll examine DBCC worst practices.