I have been playing around with the latest SQL Server 2008 CTP while trying to keep both my performance tuning and DBA classes updated with the latest changes. When I review new releases of SQL Server, I tend to evaluate the release by determining which features and functionalities that will help me solve problems that I could not solve or could not solve easily in earlier releases.
One of the problems that did not have easy solutions in earlier versions was the one faced when dealing with large amounts of data. Many times I have incorporated many different strategies to lessen the impact of large amounts of data on queries by trying to lessen the amount of data contained within the indexes on a table. This often involved partitioning the tables in SQL Server 2005 or archiving data in earlier versions. Any method that worked to create a smaller set of active data to be queried during each SELECT statement and a smaller index to be maintained during data modifications.
One of the great new performance improvements in SQL Server 2008 is the new filtered index. Filtered indexes basically allow you to create a filter on an index – think WHERE clause. The index B-Tree will only contain the rows of data that meet the filtering criteria. This allows you to reduce the amount of data contained in an index which means that you are also reducing the data affected in an index by data modifications.
I am going to walk through a very small and simple demo that illustrates the impact of filtered indexes.
First let’s create a demo table with some demo data
SET NOCOUNT ON
--Create demo table
IF OBJECT_ID('TestTable','table') IS NOT NULL
DROP TABLE TestTable
GO
CREATE TABLE TestTable
(colID INT IDENTITY(1,1)
,colDate DATETIME
,colSession INT
,colDesc VARCHAR(100)
)
--Insert data into table
DECLARE @loop INT
SET @loop = 1000
WHILE @loop > 0
BEGIN
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070301',1,'20070301-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070401',1,'20070401-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070501',1,'20070501-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070601',1,'20070601-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070701',1,'20070701-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070801',1,'20070801-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070901',1,'20070901-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071001',1,'20071001-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071101',1,'20071101-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071201',1,'20071201-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080101',1,'20080101-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080201',1,'20080201-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080301',1,'20080301-1' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070301',2,'20070301-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070401',2,'20070401-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070501',2,'20070501-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070601',2,'20070601-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070701',2,'20070701-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070801',2,'20070801-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070901',2,'20070901-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071001',2,'20071001-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071101',2,'20071101-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071201',2,'20071201-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080101',2,'20080101-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080201',2,'20080201-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080301',2,'20080301-2' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070301',3,'20070301-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070401',3,'20070401-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070501',3,'20070501-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070601',3,'20070601-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070701',3,'20070701-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070801',3,'20070801-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20070901',3,'20070901-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071001',3,'20071001-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071101',3,'20071101-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071201',3,'20071201-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080101',3,'20080101-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080201',3,'20080201-3' )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080301',3,'20080301-3' )
SET @loop = @loop - 1
END
Next we will create a clustered Index and a non-filtered nonclustered index
--Create clustered index on table
CREATE CLUSTERED INDEX cl_TestTable_1 ON TestTable(colID)
--Create nonclustered index on table
CREATE NONCLUSTERED INDEX ncl_TestTable_1 ON TestTable(colDate) INCLUDE (colSession)
After creating the indexes, let’s take a look at the number of rows in the indexes
--View number of rows in indexes
SELECT name, i.index_id, [rows]
FROM sys.partitions p
INNER JOIN sys.indexes i
ON p.[object_id] = i.[object_id]
AND p.index_id = i.index_id
WHERE p.[object_id] = OBJECT_ID('TestTable')
You should get an output similar to the one below
/*
name index_id rows
cl_TestTable_1 1 39000
ncl_TestTable_1 2 39000
*/
Now let’s create a filtered index and look at the number of rows of data
--Create filtered index
CREATE NONCLUSTERED INDEX ncl_TestTable_2 ON TestTable(colDate) INCLUDE (colSession)
WHERE colDate > '12/31/2007 23:59:59'
ncl_TestTable_2 3 9000
Once we are done with creating the infrastructure, let’s look at how they affect performance.
--How does a filter index help SELECTS
--Clear cache for demo
CHECKPOINT
DBCC DROPCLEANBUFFERS
--Get execution plan and IO statistics
SET STATISTICS PROFILE ON
SET STATISTICS IO ON
--Select by date
--This has a bookmark lookup
SELECT * FROM TestTable WITH (INDEX = 2) WHERE colDate > '12/31/2007 23:59:59'
--Table 'TestTable'. Scan count 1, logical reads 19692
--TotalSubtreeCost: 2.11023
--Use filtered index
SELECT * FROM TestTable WITH (INDEX = 3) WHERE colDate > '12/31/2007 23:59:59'
--Table 'TestTable'. Scan count 1, logical reads 18598
-- TotalSubtreeCost: 2.096156
--This does not have a bookmark lookup
SELECT colID, colDate, colSession FROM TestTable WITH (INDEX = 2) WHERE colDate > '12/31/2007 23:59:59'
--Table 'TestTable'. Scan count 1, logical reads 27
-- TotalSubtreeCost: 0.03095978
SELECT colID, colDate, colSession FROM TestTable WITH (INDEX = 3) WHERE colDate > '12/31/2007 23:59:59'
-- TotalSubtreeCost: 0.0168857
--Turn off execution plans
SET STATISTICS PROFILE OFF
SET STATISTICS IO OFF
--Insert new rows of data
--Will affect filtered index
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20080301',3,'20080301-3')
cl_TestTable_1 1 39001
ncl_TestTable_1 2 39001
ncl_TestTable_2 3 9001
--Determine which pages have been changed
SELECT DB_NAME(bd.database_id) AS 'Database Name'
,dtab.object_name AS 'Object Name'
,dtab.index_id AS 'Index number with dirty page'
,bd.page_id AS 'Page ID of dirty page'
,bd.page_type AS 'Type of Page'
,bd.page_level AS 'Page Level'
FROM sys.dm_os_buffer_descriptors AS bd
INNER JOIN
(
SELECT OBJECT_NAME([object_id]) AS object_name
,index_id ,allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND (au.type = 1 OR au.type = 3)
UNION ALL
,index_id, allocation_unit_id
ON au.container_id = p.partition_id
AND au.type = 2
) AS dtab
ON bd.allocation_unit_id = dtab.allocation_unit_id
WHERE bd.is_modified = 1
AND bd.database_id = DB_ID()
AND [object_name] = 'TestTable'
ORDER BY bd.database_id, [Object Name],index_id, page_type
Database Name
Object Name
Index number with dirty page
Page ID of dirty page
Type of Page
Page Level
AdventureWorks2005
TestTable
1
19578
DATA_PAGE
0
2
20033
INDEX_PAGE
3
20120
--Will not affect filtered index
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES ('20071101',3,'20071101-3')
cl_TestTable_1 1 39002
ncl_TestTable_1 2 39002
3012
IAM_PAGE
19928
3398
20002
20001
As you can see, having a filtered index makes a little difference in this small table. The difference you would see will grow as the size of the data grows and as the data distribution changes. Redo the demo but change the filtered index to only include the last month of data instead of the last three months. Redo the demo by adding 1,000,000 rows of data before 12/31/2007. How do these two tests differ from the one I created?
Randy Dyess
We still have spots left in our May SQL Server DBA Bootcamp being delivered in Dallas. Jump on over to our website as sign up today.
We still spots open for Solid Quality Mentor’s SQL Server Performance Tuning Bootcamp to be delivered in Dallas in April.
If you enjoyed this post and want to know more about SQL Server, make sure you sign up for one of Solid Quality Mentor’s SQL Server classes today.
Make sure you sign up for our newsletter.
I know it has been awhile since I wrote my last blog and I have all the common excuses on why I have not written one in almost a month and most of them are just excuses and not really active reasons. I have had two pieces published in the February 2008 issue of SQL Server Magazine (Virtualization of SQL Server 2008 insert and The 10 Things You Need to Know about Data Protection for SQL Server 2008 supplement) as well as two articles in the March 2008 issue of SQL Server Magazine (SQL Server 2008 Boosts Built-in Encryption and another insert: SQL Server 2008 CPU and Memory Highlights). I have also created two articles for the May 2008 issue and the April 2008 issue of TechNet magazine. It is not like I haven’t been writing anything, just not blog postings.
Well it is time to get back on track and start blogging again so I thought I would write about a great little improvement to SQL Server 2008 Query Editor. SQL Server 2005 introduced the ability to save graphical execution plans as *.sqlplan. This ability allows someone to save the plan and send it to someone else so they could review the plan and suggest improvements. SQL Server 2008 adds the ability to right-click on a graphical plan and extract the query from it. This way you can receive a saved plan from someone and then regenerate the query they used for the plan. Might come in handy to find out what the query really looked like and not what the sender thought it looked like.
I was recently looking at some blog posts and noticed a new one out by the SQL Server CAT team titled Top 10 Best Practices for Building a Large Scale Relational Data Warehouse. This short piece has several interesting items in it but one of the top 10 best practices was devoted to deleting data from the data warehouse.
If you read the piece, the deleting data section listed several different ways of deleting large amounts of data from a table in a data warehouse and one of them caught my attention. One of the best practices listed was to create a second table, insert all the data you do not want to delete into the table and then drop the one table and rename the new one. This would allow you to basically “delete” the old data through its non-inclusion in the INSERT statement. My first thought that this would be a hassle as you would have to redo all FK relationships, indexes, constraints, etc on the new table but to tell you the truth, I never thought about doing a large delete this way so I thought I would play with the statement to see if it is any better than the “normal” ways of deleting large amounts of data.
I thought about a few basic tests I could do to establish baselines that I could compare to the “INSERT” delete method. Deleting data with a DELETE statement, a batched DELETE statement, partition switching, creating a new table with SELECT INTO, and then creating a new table and populating it with INSERT SELECT. Now these tests are just basic simple tests with no indexes on the table, no relationships to worry about, no triggers, etc. attached to the table. You will have to “enlarge” these tests to create more realistic tables in a more real world environment.
Note: I am not considering TRUNCATE TABLE as I only want to delete some of the data and not all of it.
Let’s create a master table with 10,000,000 rows of data that we can use for each of our tests
Prep for the tests by creating a master table with 10,000,000 rows of data
--Create master table for demo
IF OBJECT_ID ('drop_test_table') IS NOT NULL
DROP TABLE drop_test_table
CREATE TABLE drop_test_table
col1 INT
,col2 INT
--Add data
SET @loop = 1
WHILE @loop < 10000001
INSERT INTO drop_test_table VALUES (@loop, @loop+1)
SET @loop = @loop + 1
--Clear memory
The first test is to create a table with 10,000,000 rows of data from our master table and then delete 1,000,000 rows of data using a DELETE statement with a WHERE clause. A normal delete statement that is used in thousands of operations around the world.
Test 1: Delete data DELETE statements
--Create table for test 1
IF OBJECT_ID ('drop_test_table_delete') IS NOT NULL
DROP TABLE drop_test_table_delete
CREATE TABLE drop_test_table_delete
--Insert data
INSERT INTO drop_test_table_delete (col1, col2)
(SELECT col1, col2 FROM drop_test_table)
To understand how the DELETE statement affects our environment, let’s trap the TotalSubTreeCost, the amount of time it takes, and the number of IO operations it took to complete the statement.
SET STATISTICS TIME ON
DELETE FROM drop_test_table_delete
WHERE col1 < 1000001
TotalSubtreeCost: 64.93308
Table 'drop_test_table_delete'. Scan count 3, logical reads 1038315, physical reads 0, read-ahead reads 36190, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 9610 ms, elapsed time = 11226 ms.
SET STATISTICS TIME OFF
--Clean up
Test 2: Delete data batched DELETE statements
The second test is to create a table with 10,000,000 rows of data from our master table and then delete 1,000,000 rows of data using a batched DELETE statement with a WHERE clause. Another normal delete statement that is used in thousands of operations around the world when the amount of data that needs to be deleted is large and the database administrator does not want a single large transaction that may hold locks, grow the transaction log, or take a long time to rollback if the operation fails.
--Create table for test 2
--Delete in batches of 10000
--You can adjust the batch size
DELETE TOP (10000) FROM drop_test_table_delete
GO 100
You will see a different execution plan for each 100 executions of the DELETE statement
TotalSubtreeCost: Each execution is approximately 0.6871661
The TotalSubtreeCost after adding TotalSubtreeCost from all 100 executions is around:
68.71661
IO
Each execution had different IO patterns as would be expected
The total IO after adding IO from all 100 executions is around:
3,050,000
Time
The total time after adding time from all 100 executions is around:
112 seconds
Test 3: Delete data through partition switching
The third test is a new way to delete large amounts of data from a table in SQL Server 2005. This method involves creating the table as a partitioned table and then moving the partition that contain the unwanted data to a sacrifice table that will be dropped with a DROP TABLE command which will not create a large transaction or put a lot of data into the transaction log.
IF EXISTS (SELECT * FROM sys.partition_schemes WHERE [name] = 'psPartition')
DROP PARTITION SCHEME psPartition
IF EXISTS (SELECT * FROM sys.partition_functions WHERE [name] = 'pfPartition')
DROP PARTITION FUNCTION pfPartition
--Create partition function for date
CREATE PARTITION FUNCTION [pfPartition] (INT)
AS RANGE LEFT FOR VALUES ( 1000000, 2000000, 3000000, 4000000, 5000000, 6000000, 7000000, 8000000, 9000000, 10000000)
--Create partition scheme for date
CREATE PARTITION SCHEME psPartition
AS PARTITION [pfPartition]
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY]
,[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY])
--Create partition table
IF OBJECT_ID ('drop_test_table_partition') IS NOT NULL
DROP TABLE drop_test_table_partition
CREATE TABLE drop_test_table_partition
) ON psPartition(col1)
--Create table for switch
IF OBJECT_ID ('drop_test_table_partition_new') IS NOT NULL
DROP TABLE drop_test_table_partition_new
CREATE TABLE drop_test_table_partition_new
INSERT INTO drop_test_table_partition (col1, col2)
To understand how the partition switching operation affects our environment, let’s trap the TotalSubTreeCost, the amount of time it takes, and the number of IO operations it took to complete the statement.
--Switch the partition
ALTER TABLE drop_test_table_partition SWITCH PARTITION 1 TO drop_test_table_partition_new
CPU time = 0 ms, elapsed time = 147 ms
--Drop the table
CPU time = 0 ms, elapsed time = 79 ms.
--Clean up database
Test 4: Delete data through new table (SELECT INTO)
The fourth test is based on the best practice I read from the Microsoft article. In this test, I wanted to use SELECT INTO to create the new table as SELECT INTO usually takes less system resources to create versus using an INSERT INTO statement.
--Create table for test 4
IF OBJECT_ID ('drop_test_table_new') IS NOT NULL
DROP TABLE drop_test_table_new
CREATE TABLE drop_test_table_new
INSERT INTO drop_test_table_new (col1, col2)
To understand how the delete operation affects our environment, let’s trap the TotalSubTreeCost, the amount of time it takes, and the number of IO operations it took to complete the statement.
IF OBJECT_ID ('new_table_for_select_into') IS NOT NULL
DROP TABLE new_table_for_select_into
SELECT * INTO new_table_for_select_into FROM drop_test_table_new
WHERE col1 > 1000000
TotalSubtreeCost: 121.9906
Table 'drop_test_table_new'. Scan count 1, logical reads 38315, physical reads 0, read-ahead reads 35105, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 8689 ms, elapsed time = 9965 ms.
CPU time = 16 ms, elapsed time = 11 ms.
EXEC sp_rename 'new_table_for_select_into' , 'drop_test_table_new'
0.00328328
0.00328338
0.00657236
0.003291317
0.009868517
0.006578877
logical reads round 10
total 1000 ms
Test 5: Delete data through new table (insert INTO)
The last test is the INSERT INTO test. This would be a normal method to replicate the best practice listed by Microsoft.
--Create table for test 5
IF OBJECT_ID ('drop_test_table_insert_into') IS NOT NULL
DROP TABLE drop_test_table_insert_into
CREATE TABLE drop_test_table_insert_into
INSERT INTO drop_test_table_insert_into (col1, col2)
(SELECT col1, col2 FROM drop_test_table_new WHERE col1 > 1000000)
TotalSubtreeCost: 120.7378
Table 'drop_test_table_insert_into'. Scan count 0, logical reads 9034482, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'drop_test_table_new'. Scan count 1, logical reads 38315, physical reads 0, read-ahead reads 35086, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 32417 ms, elapsed time = 38687 ms.
CPU time = 16 ms, elapsed time = 11 ms
EXEC sp_rename 'drop_test_table_select_into' , 'drop_test_table_new'
--When you are done, clean up your database
If you review the findings below, you will find that the partition switch method is the clear winner when it comes to dropping large amounts of data. Of course your tables need to be partitioned in the first place and you cannot have any non-aligned indexes on the partitioned table, but after you work through that switching is not only easy, it is fast and takes very little system resources. After that, according to my opinions on what is the best resource usage, test 3 which used the SELECT INTO.
The next three in order where Test 1, the regular DELETE, test 2, the batched DELETE, and then test 5, the INSERT INTO.
Another thing you will notice about the findings and test is that I did not take into account the resources it would take to recreate the normal indexes, relationships, triggers, constraints, etc that are usually found on normal real-world tables. You can create your own tests in your environment to see which method is best for you.
Test 1
TotalSubtreeCost: 64
IO: 1,074,000 logical reads
Time: 112 seconds
Test 2
TotalSubtreeCost: 69
IO: 3,050,000 logical reads
Test 3
TotalSubtreeCost: 0
IO: 0 logical reads
Time: 200 milliseconds
Test 4
TotalSubtreeCost: 122
IO: 73,000 logical reads
Time: 10 seconds
Test 5
TotalSubtreeCost: 121
IO: 9,072,000 logical reads
Time: 39 seconds
If you enjoyed this post and want to know more about SQL Server, make sure you sign up for one of Solid Quality Mentor’s SQL Server classes today. This material and demo is an extract of the discussions that take place on table partitioning during the SQL Server DBA Bootcamp.
We still have spots left in our March SQL Server DBA Bootcamp being delivered in Dallas. Jump on over to our website as sign up today.
Make sure you sign up for our newsletter .
In the last post on this subject I talked about table partitioning and aligned indexes. This time I would like to discuss table partitioning and non-aligned indexes. But first I wanted to give you a little code that would answer the very last pieces of the summary in the last post. In the summary I asked you to take a look at the indexes we created to find more information about them. If you looked at the sys.index_columns catalog view, you would have noticed that the nonclustered, aligned index also included the partitioning column.
--Review the indexes just created
SELECT i.[name] AS 'Index Name'
,i.index_id AS 'Index ID'
,i.type_desc AS 'Index Type'
,ic.index_column_id AS 'Column'
,c.[name] AS 'Column Name'
,ic.key_ordinal AS 'Key Column'
,ic.is_included_column AS 'INCLUDE Column'
,ic.partition_ordinal AS 'Partitioning Column'--partitioning column added to index or part of key
FROM sys.indexes i
INNER JOIN sys.index_columns ic
ON i.[object_id] = ic.[object_id]
AND i.index_id = ic.index_id
INNER JOIN sys.columns c
ON ic.[object_id] = c.[object_id]
AND ic.column_id = c.column_id
WHERE i.[object_id] = OBJECT_ID('partitionDate')
ORDER BY i.name, i.index_id,ic.index_column_id
My output
Index Name Index ID Index Type Column Column Name Key Column INCLUDE Column Partitioning Column
cl_partitionDate_colDate 1 CLUSTERED 1 colDate 1 0 1
ncl_partitionDate_colSession_1 2 NONCLUSTERED 1 colSession 1 0 0
ncl_partitionDate_colSession_1 2 NONCLUSTERED 2 colDesc 0 1 0
ncl_partitionDate_colSession_1 2 NONCLUSTERED 3 colDate 0 0 1
If you look at the reformatted output, you will notice that the nonclustered index has the one index key we defined (colSession), one INCLUDED Column (colDesc), and the partition column (ColDate). SQL Server added the colDate to the index in order for the index to be able to eliminate partitions during queries.
Let’s move forward and talk about non-aligned indexes. A non-aligned index is one that does not share the same partitioning design as the table which contains the data. You achieve non-aligned indexes by creating the index and using a different function and scheme when doing so. While there are several disadvantages to having non-aligned indexes on a partitioned table, there are several advantages that will be explained in the third part of this series when we discuss partitioned indexes and performance.
Note: Having non-aligned indexes on a table will prevent you from switching partitions from one table to another.
Let’s take a look at how to create a non-aligned index.
--Create table on main DateRange partition
IF OBJECT_ID ('partitionDate') IS NOT NULL
DROP TABLE partitionDate
CREATE TABLE partitionDate
(colDate DATETIME
SET @loop = 5000
INSERT INTO partitionDate VALUES ('20060301',1,'20060301-1' )
INSERT INTO partitionDate VALUES ('20060401',1,'20060401-1' )
INSERT INTO partitionDate VALUES ('20060501',1,'20060501-1' )
INSERT INTO partitionDate VALUES ('20060601',1,'20060601-1' )
INSERT INTO partitionDate VALUES ('20060701',1,'20060701-1' )
INSERT INTO partitionDate VALUES ('20060801',1,'20060801-1' )
INSERT INTO partitionDate VALUES ('20060901',1,'20060901-1' )
INSERT INTO partitionDate VALUES ('20061001',1,'20061001-1' )
INSERT INTO partitionDate VALUES ('20061101',1,'20061101-1' )
INSERT INTO partitionDate VALUES ('20061201',1,'20061201-1' )
INSERT INTO partitionDate VALUES ('20070101',1,'20070101-1' )
INSERT INTO partitionDate VALUES ('20070201',1,'20070201-1' )
INSERT INTO partitionDate VALUES ('20070301',1,'20070301-1' )
INSERT INTO partitionDate VALUES ('20060301',2,'20060301-2' )
INSERT INTO partitionDate VALUES ('20060401',2,'20060401-2' )
INSERT INTO partitionDate VALUES ('20060501',2,'20060501-2' )
INSERT INTO partitionDate VALUES ('20060601',2,'20060601-2' )
INSERT INTO partitionDate VALUES ('20060701',2,'20060701-2' )
INSERT INTO partitionDate VALUES ('20060801',2,'20060801-2' )
INSERT INTO partitionDate VALUES ('20060901',2,'20060901-2' )
INSERT INTO partitionDate VALUES ('20061001',2,'20061001-2' )
INSERT INTO partitionDate VALUES ('20061101',2,'20061101-2' )
INSERT INTO partitionDate VALUES ('20061201',2,'20061201-2' )
INSERT INTO partitionDate VALUES ('20070101',2,'20070101-2' )
INSERT INTO partitionDate VALUES ('20070201',2,'20070201-2' )
INSERT INTO partitionDate VALUES ('20070301',2,'20070301-2' )
INSERT INTO partitionDate VALUES ('20060301',3,'20060301-3' )
INSERT INTO partitionDate VALUES ('20060401',3,'20060401-3' )
INSERT INTO partitionDate VALUES ('20060501',3,'20060501-3' )
INSERT INTO partitionDate VALUES ('20060601',3,'20060601-3' )
INSERT INTO partitionDate VALUES ('20060701',3,'20060701-3' )
INSERT INTO partitionDate VALUES ('20060801',3,'20060801-3' )
INSERT INTO partitionDate VALUES ('20060901',3,'20060901-3' )
INSERT INTO partitionDate VALUES ('20061001',3,'20061001-3' )
INSERT INTO partitionDate VALUES ('20061101',3,'20061101-3' )
INSERT INTO partitionDate VALUES ('20061201',3,'20061201-3' )
INSERT INTO partitionDate VALUES ('20070101',3,'20070101-3' )
INSERT INTO partitionDate VALUES ('20070201',3,'20070201-3' )
INSERT INTO partitionDate VALUES ('20070301',3,'20070301-3' )
--Create partition function for session
CREATE PARTITION FUNCTION [pfSessionRange] (INT)
AS RANGE RIGHT FOR VALUES (1, 2, 3);
--Create partition scheme for session
CREATE PARTITION SCHEME psFunctionRange
AS PARTITION [pfSessionRange]
TO ([PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY]);
--Create new non-aligned index
CREATE NONCLUSTERED INDEX ncl_partitionDate_colSession_21 ON partitionDate(colSession) INCLUDE (colDesc)
ON psFunctionRange (colSession)
Once we create the index, let’s look at the index information using the code from above. You will notice that the partitioning column is now the index key column.
ncl_partitionDate_colSession_21 2 NONCLUSTERED 1 colSession 1 0 1
ncl_partitionDate_colSession_21 2 NONCLUSTERED 2 colDesc 0 1 0
What does the nonclustered index look like when the table has been partitioned but no clustered index exists? Recreate the table from above but this time partition the table with the same format as before.
--Create main partition function for date
CREATE PARTITION FUNCTION [pfDateRange] (DATETIME)
AS RANGE RIGHT FOR VALUES ( '20060301', '20060401',
'20060501', '20060601', '20060701', '20060801',
'20060901', '20061001', '20061101', '20061201','20070101','20070201','20070301');
CREATE PARTITION SCHEME psDateRange
AS PARTITION [pfDateRange]
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY]);
) ON psDateRange(colDate)
Now, recreate the non-aligned index.
Take a look at the non-aligned again. You will notice that the heap is partitioned by colDate but the non-aligned index still only has the two columns we defined in the CREATE INDEX statement, the partition column of the table is not included in the index since the index key is the partition column of the index.
Note: This will stay the same even if you created a clustered index on the table.
I know, this post doesn’t have a lot of great information in it but I wanted to use it to set up the next post in this series. The next post will cover how aligned and non-aligned indexes affect the performance of SELECT queries against the table and its indexes.
As many of you know, SQL Server 2005 introduced the ability to partition data at the table level. No longer did we have to create multiple tables and tie them back together through a partition view and check constraints on a partitioning column. In SQL Server 2005, you can create a partitioning function and partition schema and then create a table with multiple partitions. The table can be segmented into as many as 1,000 partitions but create a single table name that can be used for your queries.
Partitioned tables can offer many advantages in both performance and manageability as the table can participate in joins on partitions with other tables, have the ability to switch a large amount of data to another table without putting pressure on the transaction log, or by allowing each partition to be placed on its own drive array. Another advantage that many overlook is the ability to adjust your indexing strategy slightly to take advantage of partitions.
By default, indexes are partitioned exactly like the table. If you do not specify different partitioning functions or schema, the index will simply use the one that you created for the table. This will cause the index to be partitioned exactly like the table. You can create the indexes with different partitioning functions or schemas that are exactly alike the ones you used for the table if you wished. Either way, you are creating what is called an aligned index. Aligned indexes have individual BTrees for each table partition and the data in the individual BTrees will be aligned to the data in the table partitions.
Note: There are some restrictions you have to follow to get the indexing partitioning to work. Unique indexes (clustered or nonclustered) must have the partitioning column as part of the index key. A clustered index must contain the partitioning key. SQL Server will add the partitioning column to the clustered index key if the clustered index is created as non-unique. Nonclustered, non-unique indexes will have the partitioning column added as an INCLUDED column if it is not part of the index key.
Aligned indexes are usually the way most people will create their indexes on partitioned tables as one of the key benefits to partitioning is the ability to move (switch) a partition to another table and in order to switch partitions all the indexes on the table must be aligned. Switching a partition to another table is a very quick way to move millions of rows of data to another table in order to drop the millions of rows with a simple DROP TABLE command. This will allow you to clean out tables without having to take the hit of a DELETE statement against the main table.
The code to create a partition table and aligned indexes is actually very simple although you have to create the table on the partitioning schema which means that existing tables have to be renamed, recreated on the partitioning schema with the partitioning function, and then all the data moved back into the new partitioned table with an INSERT command. Be nice if they would give us the ability to simply ALTER existing tables when we want to start partitioning them beyond the single partition all SQL Server 2005 tables are created with.
The following code will partition a table into months. First we have to create a function that will be the basis on how we are going to partition our data.
AS RANGE RIGHT FOR VALUES ( '20060301', '20060401', '20060501', '20060601', '20060701', '20060801', '20060901', '20061001', '20061101', '20061201', '20070101', '20070201', '20070301');
This particular function will partition our data into 14 different partitions. You may only see 13 named months in the functions, but we have to handle some unnamed months as well.
Partitioned functions are created either as LEFT or RIGHT functions. What this means is how the functions will handle data that falls into ranges outside the ones we explicitly state. What do we do with data that has a date before 3/1/2006 or with data that is after 3/32/2007? When we create a partition function, SQL Server actually creates a “hidden” partition. This hidden partitioned, it is not really hidden at all as you can easily see it through the catalog views and DMVs, handles data that falls outside of our named range on one side of the data. What does this mean? When we create LEFT partitions, it means that the hidden partition is created to the right of our named range. In our case, all dates that are greater than 3/31/2007 will be placed into the hidden partitioned. Dates that happen before 3/1/2006 will be placed into the same partition as 3/2006 data. With a RIGHT function, the one we have, the hidden partition is created on the left and data with dates before 3/1/2006 will be placed in the hidden partition while data with dates after 3/31/2007 will be placed into the same partition as 3/2007 data.
Once we figure out where our data will be placed, we need to create a partitioning schema to specify where each partition will reside on our disk subsystem. This is accomplished with the use of filegroups for each partition. The partitioning schema is flexible in allowing you to use a single filegroup for all the partitions, although you will lose some performance benefits of partitioning if you do this, or you can specify a different filegroup for each partition. You can even mix and match filegroups as much as you want as long as each partition named in the function, including the hidden one, has a filegroup to be placed on.
--Each partition will go onto the same filegroup
Now all we have to do is create the table and put data into it.
INSERT INTO partitionDate
VALUES ('20060301',1,'20060301-1' )
VALUES ('20060401',1,'20060401-1' )
VALUES ('20060501',1,'20060501-1' )
VALUES ('20060601',1,'20060601-1' )
VALUES ('20060701',1,'20060701-1' )
VALUES ('20060801',1,'20060801-1' )
VALUES ('20060901',1,'20060901-1' )
VALUES ('20061001',1,'20061001-1' )
VALUES ('20061101',1,'20061101-1' )
VALUES ('20061201',1,'20061201-1' )
VALUES ('20070101',1,'20070101-1' )
VALUES ('20070201',1,'20070201-1' )
VALUES ('20070301',1,'20070301-1' )
VALUES ('20060301',2,'20060301-2' )
VALUES ('20060401',2,'20060401-2' )
VALUES ('20060501',2,'20060501-2' )
VALUES ('20060601',2,'20060601-2' )
VALUES ('20060701',2,'20060701-2' )
VALUES ('20060801',2,'20060801-2' )
VALUES ('20060901',2,'20060901-2' )
VALUES ('20061001',2,'20061001-2' )
VALUES ('20061101',2,'20061101-2' )
VALUES ('20061201',2,'20061201-2' )
VALUES ('20070101',2,'20070101-2' )
VALUES ('20070201',2,'20070201-2' )
VALUES ('20070301',2,'20070301-2' )
VALUES ('20060301',3,'20060301-3' )
VALUES ('20060401',3,'20060401-3' )
VALUES ('20060501',3,'20060501-3' )
VALUES ('20060601',3,'20060601-3' )
VALUES ('20060701',3,'20060701-3' )
VALUES ('20060801',3,'20060801-3' )
VALUES ('20060901',3,'20060901-3' )
VALUES ('20061001',3,'20061001-3' )
VALUES ('20061101',3,'20061101-3' )
VALUES ('20061201',3,'20061201-3' )
VALUES ('20070101',3,'20070101-3' )
VALUES ('20070201',3,'20070201-3' )
VALUES ('20070301',3,'20070301-3' )
At this point, we can create our indexes.
--Create clustered index on partitionDate
CREATE CLUSTERED INDEX cl_partitionDate_colDate ON partitionDate(colDate)
--Create nonclustered index on partitionDate
CREATE NONCLUSTERED INDEX ncl_partitionDate_colSession_1 ON partitionDate(colSession) INCLUDE (colDesc)
Now that we have data and indexes, let’s look at how the indexes have been created by looking at the partitioning catalog views and DMVs.
--View partition information
SELECT partition_id
,OBJECT_NAME(object_id) AS 'Function name'
,index_id
,partition_number --1 is the hidden partition for RIGHT partitions
,hobt_id --Heap or BTree ID (same as the partition_id for aligned indexes)
,rows --Number of rows in each partition
FROM sys.partitions
WHERE [object_id] = (object_id('partitionDate'))
--Partition Information
--Some of this is redundant with the above catalog view
SELECT 'Partition Information'
SELECT * FROM sys.dm_db_partition_stats
--Let's look at some different information
SELECT pf.[name]
, pf.[type] --What type of partitioning function
, pf.type_desc --What type of partitioning function
, pf.fanout --How many partitions do we have
, pf.boundary_value_on_right --1 for RIGHT, 0 for LEFT
, pf.create_date
, pf.modify_date
, rv.boundary_id
, rv.[value]
, t.[name]
, s.[name]
, pp.max_length
, pp.[precision]
, pp.scale
, pp.collation_name
, t.is_nullable
, t.is_user_defined
, t.is_assembly_type
FROM sys.partition_functions pf
INNER JOIN sys.partition_parameters pp
ON pf.function_id = pp.function_id
INNER JOIN sys.types t
ON pp.system_type_id = t.system_type_id
INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id
INNER JOIN sys.partition_range_values rv
ON pp.function_id = rv.function_id
AND pp.parameter_id = rv.parameter_id
WHERE pf.name = 'pfDateRange'
ORDER BY pf.[name], rv.boundary_id
Now that we have looked at some of the basic information about our table and partitions, play with the above code to see how you can partition the table in different ways. Try looking at the columns of the indexes we created, use the index system catalog views to look at the columns of the nonclustered index.
In the next blog post, I will take this one step further and talk about non-aligned indexes and why we may what to create them on our tables.
Don't miss our SQL Server Performance Tuning Bootcamp in Dallas, TXMarch 10-14, 2008
Register Now for this SQL Server Performance Tuning Bootcamp taught by Randy Dyess5-Day Bootcamp only $2,700! *with coupon
Course Information:Need advanced level performance tuning information. Gone through the DBA courses but they didn’t go far enough? Solid Quality Mentors will be giving a 5 day advanced level performance tuning bootcamp for SQL Server 2005 and 2008 the week of March 10, 2008 in the Galleria mall area of North Dallas and Addison.
This course is designed to give attendees a deep level look at performance tuning methodologies and capabilities found in SQL Server 2005 and SQL Server 2008. The bootcamp is designed to prepare any SQL Server DBA to improve the performance of their database environment through a practical approach and hands on labs.
Register online at www.solidq.com or contact us at info@solidq.com if you have any further questions.
GROUP DISCOUNTS AVAILABLE AND COUPON CODE BELOW.
"Solid Quality Mentors is the trusted global provider of advanced education and solutionsfor the entire Microsoft database platform."
A Microsoft Gold Certified Partner
A configuration option in SQL Server that is often overlooked or misapplied is the Query Wait configuration option. This option determines how long a query will wait for a resource (memory) with the default value of -1. The default value means that a query will wait for approximately 25 times the estimated cost of the query before it times out while waiting for memory. An example would be a query with an estimated cost of 10 would wait for 250 seconds before timing out, slightly over four minutes.
Note: If the estimated cost is below 1 then the query will wait for 25 seconds before timing out
It is easy to determine what your value is for this option, chances are it is at the default and most people do not interact with this configuration option very often.
--Review the current query wait setting (2005 and 2008)
SELECT * FROM sys.configurations
WHERE configuration_id = 1541
--Review the current query wait setting (2000)
EXEC sp_configure 'show advanced',1
RECONFIGURE
EXEC sp_configure 'query wait'
So why would the option have a different setting or why would you need to set the option to a different value considering that most complex queries have estimated costs over 10 that would have then waiting for over four minutes for memory? Well sometimes a query timeout happens. Queries who are often large consumers of memory (queries with hashing and sorting) or queries being parallelized are perfect candidates to time out due to lack of memory resources.
How do we know that queries are timing out? Usually DBAs start getting indications of query timeouts by seeing the following error in their error log:
Error 8645: A time out occurred while waiting for memory resources to execute the query. Re-run the query.
You may often get other conditions due to query timeouts that are a little harder to track back to the query timeout. What are some of these conditions?
· New users receiving “login failed” error messages
· User are disconnected with various error messages
· You see very high CPU usage
· You notice processes with wait types of 0x40, 0x0040 and last wait types of RESOURCE_SEMAPHORE
· System Monitor object SQLServer:Memory Manager has non zero values for grants pending
· Profiler or SQL Trace reports “Execution Warnings” events which include “Wait For Memory” or “Wait For Memory Timeout” in the text
Since the first few conditions on that list are generic enough we probably will not think of looking for query timeouts when there are a 100 over things that cause those conditions as well and the last few conditions actually require us to be trapping something before it occurs, chances are we are only going to find out after getting the error message in our error logs. But what if you wanted to be pro-active and trap for the query timeout in case the error is not seen in the error log?
As listed above there are several ways to trap for query timeouts. The System Monitor (Perfmon to us non Microsoft folks) SQLServer:Memory Manager counters Memory Grants Pending and Memory Grants Outstanding are good choices to have in your performance monitoring solution. SQL Server Profiler or SQL Trace can be used by including the Execution Warnings event, this is found under the Errors and Warnings event category. Once you are capturing this, you will need to scan the text output of this event to search for “Wait For Memory” or “Wait For Memory Timeout” in the text.
Want a SQL Server method to find out if you have query timeouts without waiting for the error or running another application like System Monitor or Profiler? There are several ways to find these query timeouts using T-SQL and SQL Server’s catalog views and dynamic management views.
If you are using SQL Server 2000, you really only have one option. This is to review the output of the dbo.sysprocesses system table.
--Review output of dbo.sysprocesses
SELECT spid
,dbid
,uid
,status
,sql_handle
,waittype
,waittime
,lastwaittype
,cmd
,hostname
,program_name
,nt_domain
,nt_username
,loginame
FROM master.dbo.sysprocesses
WHERE waittype IN (0x040, 0x0040)
AND lastwaittype = 'RESOURCE_SEMAPHORE'
--Review the text of each process that in the output list
SELECT * FROM ::fn_get_sql(0x0100010094BD2010C8F61E1E0000000000000000)
If you are on SQL Server 2005 or 2008, you can get more information by using a combination of system catalog views and dynamic management views and functions.
--Replacement for master.dbo.sysprocesses
SELECT r.session_id -- new column for SPID
,r.database_id
,r.user_id
,r.status
,st.text
,r.wait_type
,r.wait_time
,r.last_wait_type
,r.command
,es.host_name
,es.program_name
,es.nt_domain
,es.nt_user_name
,es.login_name
,mg.dop --Degree of parallelism
,mg.request_time --Date and time when this query requested the memory grant.
,mg.grant_time --NULL means memory has not been granted
,mg.requested_memory_kb --Total requested amount of memory in kilobytes
,mg.granted_memory_kb --Total amount of memory actually granted in kilobytes. NULL if not granted
,mg.required_memory_kb --Minimum memory required to run this query in kilobytes.
,mg.query_cost --Estimated query cost.
,mg.timeout_sec --Time-out in seconds before this query gives up the memory grant request.
,mg.resource_semaphore_id --Nonunique ID of the resource semaphore on which this query is waiting.
,mg.wait_time_ms --Wait time in milliseconds. NULL if the memory is already granted.
,CASE mg.is_next_candidate --Is this process the next candidate for a memory grant
WHEN 1 THEN 'Yes'
WHEN 0 THEN 'No'
ELSE 'Memory has been granted'
END AS 'Next Candidate for Memory Grant'
,rs.target_memory_kb --Grant usage target in kilobytes.
,rs.max_target_memory_kb --Maximum potential target in kilobytes. NULL for the small-query resource semaphore.
,rs.total_memory_kb --Memory held by the resource semaphore in kilobytes.
,rs.available_memory_kb --Memory available for a new grant in kilobytes.
,rs.granted_memory_kb --Total granted memory in kilobytes.
,rs.used_memory_kb --Physically used part of granted memory in kilobytes.
,rs.grantee_count --Number of active queries that have their grants satisfied.
,rs.waiter_count --Number of queries waiting for grants to be satisfied.
,rs.timeout_error_count --Total number of time-out errors since server startup. NULL for the small-query resource semaphore.
,rs.forced_grant_count --Total number of forced minimum-memory grants since server startup. NULL for the small-query resource semaphore.
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_exec_sessions es
ON r.session_id = es.session_id
INNER JOIN sys.dm_exec_query_memory_grants mg
ON r.session_id = mg.session_id
INNER JOIN sys.dm_exec_query_resource_semaphores rs
ON mg.resource_semaphore_id = rs.resource_semaphore_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle)st
--Is anything timing out
SELECT * FROM sys.dm_exec_query_optimizer_info
WHERE counter = 'timeout'
This combination of dynamic management views in SQL Server 2005 or 2008 can give you a much better description of what is going on and hoping help you troubleshoot your query timeouts.
We still spots open for Solid Quality Mentor’s SQL Server Performance Tuning Bootcamp to be delivered in Dallas in March.
If you enjoyed this post and want to know more about SQL Server, make sure you sign up for one of Solid Quality Mentor’s SQL Server classes today. (Ask about our new upgrade class which should be ready by the end of Jan. 2008. This course will cover upgrading to both SQL Server 2005 and 2008 from earlier versions).
I just wanted to send a quick announcement on two new pieces that I completed at the end of last year that are just now being published. One is the first of a series of 8 videos I recorded for SQL Server Magazine and DELL on various SQL Server 2008 topics. This one is over SQL Server 2008 security. The second is an article I authored for SQL Server Magazine and DELL on virtualization of SQL Server 2008.
Virtualization of SQL Server 2008
How Can I Secure Data? Security in SQL Server 2008
As promised, I will be putting out some more technical blogs as well as a few more technical videos for JumpStartTV.com.
Still spots open for Solid Quality Mentor’s SQL Server Performance Tuning Bootcamp to be delivered in Dallas in March.
How to Guide: SQL Server 2005 Consolidation
As organizations around the world increase the number of applications that collect and utilize data, they are increasing the number of servers that must be used for both the collection and storage of that data. Data centers designed to hold dozens of servers are now holding hundreds. System administrators have grown in number and are faced with the problem of running these hundreds of servers in data centers with inadequate power and inadequate cooling. As these administrators face this surge of new hardware, management faces the ever increasing costs of running the data centers. Power costs, hardware costs, and personnel costs are rising and straining budgets. To overcome this budget-bursting surge of equipment and manpower, server manufacturers have created larger and larger servers capable of handling the workload of multiple smaller servers. System administrators and management are now finding that a new process—consolidation—has come into place that will allow the replacement of multiple, smaller machines with larger more powerful servers. Consolidation is the grouping of multiple applications or databases on one server. Whether this consolidation takes place through the virtualization of servers—multiple servers running independently on a host server—or the consolidation is designed to have all applications or databases under the same operating system, consolidation allows the use of one larger machine in the place of multiple machines and helps data centers reduce the number of servers they have to support.
Download the white paper
How to Guide: SQL Server 2005 Clustering
This SQL Server 2005 Magazine white paper is written for a technical audience that needs to understand failover clustering, and that wants to know how SQL Server 2005 is implemented on a failover cluster, how to install and configure SQL Server 2005 for failover clustering, and best practices for SQL Server 2005 clustering. As SQL Server 2005 environments move from smaller installations to larger mission-critical enterprises, the need for the database environment to be highly available becomes more apparent. SQL Server 2005 has many different mechanisms to achieve high availability. But one of the most commonly used methods to achieve a highly available mission-critical database environment is SQL Server 2005’s ability to make use of clustered environments
I will get back to you shortly with some new technical posts.
We still have spots left in our March SQL Server DBA Bootcamp (http://learning.solidq.com/na/CourseDetail.aspx?CourseScheduleId=272). Jump on over to our website as sign up today.
If you enjoyed this post and want to know more about SQL Server, make sure you sign up for one of Solid Quality Mentor’s (http://learning.solidq.com/na/Default.aspx) SQL Server classes today. (Ask about our new upgrade class which should be ready by the end of Jan. 2008. This course will cover upgrading to both SQL Server 2005 and 2008 from earlier versions).
Make sure you sign up for our newsletter (http://www.solidq.com/na/Newsletter.aspx).
I’m going to start calling these little blog postings something like: Did You Know? Often when I go to client’s sites for performance tuning engagements, I see that they have set the Affinity Mask to leave 1 CPU free for the OS and other applications while specifying all the others for SQL Server. Now there is quite a few reasons to use the CPU Affinity Mask, but setting aside a CPU for the OS and other applications is usually not one of them. I know there are exceptions, but SQL Server and most other well written Microsoft applications can share CPUs on a server without too many issues. I came across this topic when reviewing the Performance Tuning Bootcamp for changes to SQL Server 2008 and I thought I would write up a little blog posting on why setting the Affinity Mask can hurt performance and you should only do it when absolutely necessary.
SQL Server creates its own scheduler for each CPU on the server and uses those schedulers to handle the scheduling of its tasks instead of using Windows schedulers (it is a bit more complex than that but this is a short post). The reason it creates its own set of schedulers is that the Windows schedulers are pre-emptive (threads are given a priority and a time slice). Threads running under Windows schedulers can be bumped off the scheduler if another thread with a higher priority needs to run (the thread can keep getting bumped off but Windows will start raising its priority level so eventually it will run) or a thread will get bumped off the scheduler once it has reached it allotted time slice. Either way, having a database thread get bumped off a scheduler is not desirable since database threads are usually holding locks at some level. Now, SQL Server schedulers are cooperative. They will not bump off other threads due to priority levels and they are allowed to run until they need to wait for a resource. This allows a thread to do as much work as possible and hold the locks for shorter periods of time.
So what does this have to do with Affinity Mask? By default, schedulers created in SQL Server are not bound to a particular Windows CPU. The threads running on a scheduler can be moved around to different CPUs if needed. Once you set Affinity Mask, the schedulers become hard bound to a Windows scheduler. This means that once a thread starts running on an affinitized scheduler it will not be moved to another CPU if that CPU becomes overworked with other threads. This can have a dramatic effect of the overall performance of processes running on SQL Server as a thread may have to wait to finish executing if the CPU that its scheduler is affinitized too is currently running many other threads.
I will admit there are several cases when you need to set Affinity mask, but you should try everything you can before you actually do so. Cases when you might need it are usually reserved to having multiple instances of SQL Server on the same server and you want to make sure that the processes executing in one instance do not take up all the CPU resources and starve the processes in the other instances from being executed quickly. Think of an instance executing very large report queries that are consuming 100% of your CPUs. The second instance needs to run a very quick update query for a user on the web. This user may actually have to sit and wait on the report queries to leave some CPU available for it.
In order to get around the problem of hard binding, there are several things we can do. First, is not to set Affinity Mask unless we absolutely need it. Second is to use the Windows 2003 Resource Manager to assign CPU resources to individual SQL Server executables. This is a great tool for reserving CPU resources so all applications on your server can execute. Third is to use Soft-NUMA in SQL Server 2005 or 2008 and assign only some of the CPUs to the NUMA nodes on each instance. The use of NUMA nodes will restrict the number of CPUs in each node without hard-binding the schedulers. The fourth thing you might try is to utilize SQL Server trace flag 8002. This trace flag will still respect the Affinity Mask but will not bind the schedulers to particular CPUs. This allows threads from the schedulers to utilize all the CPUs in the affinity mask.
All of the information I posted above is a bit more complicated that this little blog posting could handle but once you understand the relationship between Windows schedulers and SQL Server schedulers and what binding is, you may be able to understand some of the performance issues you may face after setting the Affinity Mask.
We still have spots left in our March SQL Server Performance Tuning Bootcamp. Jump on over to our website as sign up today.