Redundant Indexes Part 1: Identifying Redundant Indexes That Are Not Used To Service Queries

Overview

Indexes are a powerful concept in any RDBMS. They can dramatically increase the performance of queries, but they can also become overly burdensome. Enter redundant, unused indexes. Every index is maintained by SQL Server after every INSERT, UPDATE, or DELETE statement. So, if there are indexes on a table that are redundant, and aren’t used to service any SELECT queries, then it is a waste of resources for SQL Server to store and maintain those indexes. This maintenance unnecessarily slows down INSERT, UPDATE, and DELETE statements. So when I found literally dozens of these indexes in each production database in this specific client’s environment, suffice it to say I was shocked.

Naturally, I took it upon myself to delete these indexes. Doing this manually would be extremely time consuming, as there were well over a thousand of these across all production databases. I cobbled together code from my own mind and a few disparate sources across the internet and created a stored procedure to report back to the caller any redundant and unused indexes in the database in which the procedure was run. I created the procedure in the [master] database and marked it as a system object. This way, I only had to deploy the procedure to each instance of SQL Server, and not each individual database. Without further ado, here is the code for the procedure. Below the full procedure code is a discussion of the parts.

Procedure Code

USE [master]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[sp_RedundantIndexes]
AS 
BEGIN

	SET NOCOUNT ON;

	DECLARE @ID				INT
	DECLARE @DatabaseName	NVARCHAR(128)
	DECLARE @TableName		NVARCHAR(128)
	DECLARE @IndexName		NVARCHAR(128)
	DECLARE @IndexID		INT
	DECLARE @ObjectID		BIGINT
	DECLARE @LastSvcRestart	DATETIME

	CREATE TABLE #indexdetails (
		  ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
		, ServerName NVARCHAR(128)
		, DatabaseName NVARCHAR(128)
		, TableName NVARCHAR(128)
		, IndexName NVARCHAR(128)
		, IndexID INT
		, IndexType NVARCHAR(128)
		, [Object_ID] BIGINT
		, MostRecentDate DATETIME
		, IsPK BIT NULL
		, StatsSum INT NULL
		, IsSubset BIT DEFAULT(0)
		, IsRedundant BIT DEFAULT(0)
		, IsMaster BIT DEFAULT(0)
		, Processed BIT DEFAULT(0)
	)
	
	--Get time of last SQL Server service restart.  Whenever SQL Server is restarted, tempdb is recreated
	SELECT @LastSvcRestart = create_date FROM sys.databases WHERE name = 'tempdb';
	
	IF ( DATEDIFF(HH, @LastSvcRestart, GETDATE()) <= 96 )
	BEGIN
		SELECT TOP 0 ServerName, DatabaseName, TableName, IndexName 
		FROM #indexdetails;
		
		RETURN;
	END

	INSERT #indexdetails (ServerName, DatabaseName, TableName, IndexName, IndexType, IndexID, [Object_ID], MostRecentDate, IsPK)
	SELECT DISTINCT 
		  CAST(SERVERPROPERTY('MachineName') AS NVARCHAR(128)) --@@SERVERNAME
		, DB_NAME()
		, o.name AS TableName
		, i.name AS IndexName
		, i.type_desc AS IndexType
		, i.index_id
		, o.[object_id]
		, CASE WHEN o.modify_date > o.create_date THEN o.modify_date ELSE o.create_date END AS MostRecentDate
		, CASE WHEN kc.[type] = 'PK' THEN 1 ELSE 0 END AS IsPK
	FROM		sys.indexes i 
	INNER JOIN	sys.objects o ON i.[object_id] = o.[object_id]
	INNER JOIN	sys.index_columns ic ON i.[object_id] = ic.[object_id] AND i.[index_id] = ic.[index_id]
	LEFT  JOIN	sys.key_constraints kc ON i.[object_id] = kc.[parent_object_id] AND i.name = kc.name
	WHERE o.type_desc <> 'SYSTEM_TABLE'
	ORDER BY o.name, i.name

	WHILE EXISTS(SELECT 1 FROM #indexdetails WHERE processed = 0)	-- Still records to process
	BEGIN
		SELECT TOP 1 
			  @ID = ID
			, @DatabaseName = DatabaseName
			, @TableName = TableName
			, @IndexName = IndexName
			, @IndexID = IndexID
			, @ObjectID = [Object_ID]
		FROM #indexdetails
		WHERE processed = 0

		UPDATE #indexdetails 
		SET		processed = 1 ,
				IsSubset = (SELECT ISNULL(MAX(IndexStatus), 0) FROM (
								SELECT CASE WHEN NOT EXISTS(
											SELECT index_column_id, column_id, is_descending_key, is_included_column FROM sys.index_columns WHERE index_id = @IndexID AND [object_id] = OBJECT_ID(@TableName)
											EXCEPT
											(
												SELECT index_column_id, column_id, is_descending_key, is_included_column FROM sys.index_columns WHERE index_id = @IndexID AND [object_id] = OBJECT_ID(@TableName)
												INTERSECT
												SELECT index_column_id, column_id, is_descending_key, is_included_column FROM sys.index_columns WHERE index_id = t2.index_id AND [object_id] = OBJECT_ID(@TableName)
											)
										)
										THEN 1 ELSE 0 END AS IndexStatus	--1 implies "IsContainedIn" and 0 implies "IndependentOf"
										, t2.index_id
								FROM sys.indexes t2
								WHERE t2.[object_id] = OBJECT_ID(@TableName) AND t2.[index_id] <> @IndexID
								) z),
				IsMaster = (SELECT ISNULL(MAX(IndexStatus), 0) FROM (
								SELECT CASE WHEN NOT EXISTS(
											SELECT index_column_id, column_id, is_descending_key, is_included_column FROM sys.index_columns WHERE index_id = @IndexID AND [object_id] = OBJECT_ID(@TableName)
											EXCEPT
											SELECT index_column_id, column_id, is_descending_key, is_included_column FROM sys.index_columns WHERE index_id = t2.index_id AND [object_id] = OBJECT_ID(@TableName)
											)
										AND NOT EXISTS(
											SELECT index_column_id, column_id, is_descending_key, is_included_column FROM sys.index_columns WHERE index_id = t2.index_id AND [object_id] = OBJECT_ID(@TableName)
											EXCEPT
											SELECT index_column_id, column_id, is_descending_key, is_included_column FROM sys.index_columns WHERE index_id = @IndexID AND [object_id] = OBJECT_ID(@TableName)
										)
										THEN 1 ELSE 0 END AS IndexStatus	--1 implies "IsDuplicate" and 0 implies "NotExactMatch"
										, t2.index_id
								FROM sys.indexes t2
								INNER JOIN #indexdetails n ON t2.index_id = n.indexID AND t2.[object_id] = n.[object_id] AND n.IsMaster = 0
								WHERE t2.[object_id] = OBJECT_ID(@TableName) AND t2.[index_id] <> @IndexID 
							) z)
		WHERE ID = @ID;
	
		;WITH stats AS (
			SELECT s.database_id, s.[object_id], s.index_id, SUM(s.user_seeks + s.user_scans + s.user_lookups) AS StatsSum
			FROM sys.dm_db_index_usage_stats s
			WHERE s.database_id = DB_ID()
			GROUP BY  s.database_id, s.[object_id], s.index_id
		)

		UPDATE i
		SET StatsSum = ISNULL(s.StatsSum,0)
		FROM		#indexdetails i
		LEFT JOIN	stats s ON s.database_id = DB_ID(i.DatabaseName) AND s.[object_id] = OBJECT_ID(i.TableName) AND s.index_id = i.IndexID

		UPDATE #indexdetails SET isRedundant = 1 WHERE isSubset = 1 AND ISNULL(statsSum, 0) = 0 AND ID = @ID

	END
	
	SELECT ServerName, DatabaseName, TableName, IndexName, IndexType
	FROM #indexdetails 
	WHERE	isRedundant = 1
	AND		IsPK <> 1
	AND		IsMaster <> 1

END
GO

Code Guts

Lines 1 – 40: The code in the first 40 lines is very straightforward. The table #indexdetails will store all of the information needed during execution of the stored procedure. The variables will be used when iterating over each record in #indexdetails.

Lines 41 – 50: This code causes the stored procedure to return 0 results if it is run within 4 days of the last restart of SQL Server. For the client’s specific needs, this 4 day restriction is sufficient to ensure that we don’t return indexes that are used less frequently but nevertheless are important. Most people may find that this time frame is far too short. If your organization has reporting queries that only hit an index once a month, quarter, or year, this restriction will need to be adjusted accordingly. When you combine the fact that index use stats are reset at the time of a SQL Server restart and the fact that a query may not use an index but once in a great while, it may not be possible to identify unused indexes. This procedure is not used on databases where this is the case.

Lines 52 – 68: This INSERT statement is straight-forward and populates #indexdetails with basic information on every index in the database (excluding those on system tables).

Lines 70 – 132: These lines are for the WHILE loop that does the majority of work in the procedure and the most interesting pieces are discussed in further detail below. This section iterates over every index listed in #indexdetails one record at a time.

Lines 84 – 98: This subquery uses an additional correlated subquery to check if the current index is a subset of another index on the same table. First index_column_id, column_id, is_descending_key, and is_included_column values are obtained from sys.index_columns for the current index. Next these values are Intersected with the same values for every other index on the same table. This is what makes the inner subquery correlated. The first result is then EXCEPTed from the second result set. If nothing is returned from this EXCEPT statement, then the index in question is a subset of another index on the same table. The idea here is to find any index on the same table that has the same set of values for index_column_id, column_id, is_descending_key, and is_included_column as the current index being analyzed. When we encounter one, we know that the current index can be seen as a subset of another index.

Lines 99 – 115: Like the subquery just discussed, this one has a correlated subquery as well. This subquery checks to see if the current index should be considered a “master” index. Here, master index must be defined in the event that N indexes are identical. This will prevent the procedure from returning all of those indexes as redundant, when only N-1 indexes are redundant. That is, if 3 indexes have identical definitions, then only 2 are redundant, not all 3. This subquery uses EXCEPT but not INTERSECT. EXCEPT is used to retrieve index_column_id, column_id, is_descending_key, and is_included_column values that are present for the current index that are not in any of the other indexes on the same table, checking one index at a time. At the same time, the code looks at index_column_id, column_id, is_descending_key, and is_included_column values that are present in the correlated set (minus any indexes already marked IsMaster = 1) that are not present in the result set for the current index. If both EXECPT queries return empty sets, the current index is marked as the master.

Lines 118 – 128: This query uses a CTE and the DMV sys.dm_db_index_usage_stats to SUM the values for user_seeks, user_scans, and user_lookups. This value is stored as StatsSum in #indexdetails.

Line 130: This UPDATE sets the IsRedundant flag for indexes that have been marked IsSubset = 1 and have no combined user_seeks, user_scans, or user_lookups.

Lines 134 – 138: This simple SELECT returns records to the caller of sp_RedundantIndexes. The records that are returned are ones that are marked as redundant but aren’t Primary Keys and aren’t marked as a master index.

Limitations

There are some limitations to this code. The first and foremost limitation concerns the use stats for the index. If an index is used only rarely, but is critical when it does get used, then this code may incorrectly mark it as unused. That said, only redundant AND unused indexes are returned. This reduces the possibility that the procedure returns a needed index. The other limitation here is that this doesn’t look at primary keys. The reason for this is that the client I built this for often times uses a single column as a primary key, and that column is then duplicated in other indexes that index additional columns, with the primary key column being the leading column. These primary keys may be considered redundant indexes, but because they are primary keys, they should not be dropped, as other tables may have foreign keys pointing back to them.

What’s Next

In Redundant Indexes Part 2, I show a simple Powershell script that I use to take action on the indexes returned by sp_RedundantIndexes.