SQL Server indexes

Designing SQL Server non-clustered indexes for query optimization

Non-clustered indexes are bookmarks that allow SQL Server to find shortcuts to the data you’re searching for. Non-clustered indexes are important because they allow you to focus queries on a specific subset of the data instead of scanning the entire table. We’ll address this critical topic by first hitting the basics, such as how clustered indexes interact with non-clustered indexes, how to pick fields, when to use compound indexes and how statistics influence non-clustered indexes.

The basics of non-clustered indexes in SQL Server

A non-clustered index consists of the chosen fields and the clustered index value. If the clustered index is not defined as unique, then SQL Server will use a clustered index value plus a uniqueness value. Always define your clustered indexes as unique — if they are in fact unique — because it will result in a smaller clustered/non-clustered index size. If your unique clustered index consists of an int and you create a non-clustered index on a year column (defined as smallint), then your non-clustered index will contain an int and smallint for every row in the table. The size would increase according to the data types chosen. So the smaller the clustered/non-clustered index data types are, the smaller the resulting index size will be, and the maintenance capacity will increase.

Choosing fields for non-clustered indexes

The first rule is to never include the clustered index key fields in the non-clustered index. The field is already part of the clustered index, so it will always be used for queries. The only time it makes sense to include any clustered index key in a non-clustered index is when the clustered index is a compound index and the query is referencing the second, third or higher field in the compound index.

Assume you have the following table:

ID (identity, clustered unique)

DateFrom

DateTo

Amt

DateInserted

Description

 

 


Clustered and non-clustered indexes in SQL Server

The biggest difference between a clustered index and a non-clustered index is that when you use a clustered index, the section of the table that comprises the index has its data pages linked differently from those data pages comprising a non-clustered index.

btree

 

A “B-Tree” or “Balanced Tree” is the general structure that clustered indexes take.

They’re kind of like a telephone book. Index data pages have pointers to smaller subsets of data, which have pointers to even smaller subsets of data, etc. For instance, when you open the phone book, what do you see in the upper left or right hand corners of the opposing pages? You see the range that the page covers. That’s how B-Trees function.

So what’s the difference here? When a clustered index is applied, the data at the “leaf level” contains the actual data pages where you’ll find the data being searched on. In a non-clustered index, the data pages at the leaf level merely have pointers to more data pages containing the actual data being searched upon. This being the case, the data pages at the leaf level in a clustered index are only ordered one way, and “in order.”

 

For instance, if you have a column with an IDENTITY constraint applied to it as well as a clustered index, the numbers comprising the IDENTITY constraint will always be in order. Yet, that is not to say always contiguous, as you can DELETE rows. But they will always be in order. This makes for very fast searching, especially when using this scenario for something like order or invoice ID’s.

 

Since clustered indexes keep all data within them at the leaf level of the B-Tree, any modifications to the data require a rearranging of the data pages. This means, if you add a clustered index to a table that is heavily inserted, updated, or deleted, you will probably need to rebuild or de-fragment the index more often than when a non-clustered index is used. This is due to all of the data page movement that occurs. Once again, the gain you receive is faster reads of the data, due to the orderly fashion in which the data is laid out. The other major difference is that you only get one clustered index per table. You can apply 249 non-clustered indexes.

 

Clustered indexes are created by default to match the primary key, which is defined on tables in SQL Server. However, you can create a clustered index on any column and then define a primary key on a separate column or columns. At this point, the primary key would be created as a unique non-clustered index. Typically, a clustered index will match the primary key, but not necessarily, so be careful. Given the variety of situations that can arise, I’ll be discussing the clustered indexes themselves, and for now ignore whether you choose to make them primary keys.

Clustered indexes actually hold the row data for SQL Server, so wherever your clustered indexes are stored is also where your data is stored. The clustered indexes are organized into ranges of data. For example, values 1 to 10 may be stored in one range and 90 to 110 in another range. Since clustered indexes are stored as ranges, if you need to do a search on a range for an audit log, it would be more efficient for the clustered index to be based on the date column that would be used to return the date ranges. Non-clustered indexes work better for specific value searches, e.g. “date = DateValue,” rather than range searches, e.g. “date between date1 and date2.”

 

No comments yet

Leave a reply