Monday 30 May 2011

How to resolve a deadlock


SQL Server Technical Bulletin

Topic covered in this issue: How to resolve a deadlock

Goal

To identify, to troubleshoot, and to recommend a solution for resolving a deadlock.

Introduction

This article examines a deadlock situation and provides steps for resolving the deadlock. Each deadlock may be different and can be caused by several different environment variables. The information provided in this article can help you identify and resolve a deadlock.

Case study

In a case study, we examine a 911 system that has six operators. During peak activity, the Microsoft Visual Basic front-end application they are using experiences broken connections. Because of the broken connections, the operators must re-input data. For a 911 system that operates 24 hours a day, seven days a week, this behavior is unacceptable.

What is a deadlock?

A deadlock occurs when two system server process IDs (SPIDs) are waiting for a resource and neither process can advance because the other process is preventing it from getting the resource.

The lock manager’s thread checks for deadlocks. When a lock manager’s deadlock detection algorithm detects a deadlock, the lock manager chooses one of the SPIDs as a victim. The lock manager initiates a 1205 error message that is sent to the client, and the lock manager kills the SPID. Killing the SPID frees the resources and allows the other SPID to continue. Killing the SPID that is the deadlock victim is what causes the broken connection that the Visual Basic front-end application experiences.

In a well designed application, the front-end application should trap for the 1205 error, reconnect to SQL Server, and then re-submit the transaction.

Although deadlocks can be minimized, they cannot be completely avoided. That is why the front-end application should be designed to handle deadlocks.

How to identify a deadlock

Step 1

To identify a deadlock, you must first obtain log information. If you suspect a deadlock, you must gather information about the (SPIDs) and the resources that are involved in the deadlock. To do this, add the -T1204 and the -T3605 startup parameters to SQL Server. To add these two startup parameters, follow these steps:
Start SQL Server Enterprise Manager.
Select, and then right-click the server.
Click Properties.
Click Startup Parameters.
In the Startup Parameters dialog box, type -T1204 in the Parameters text box, and then click Add.
In the Parameters text box, type -T3605, and then click Add.
Click OK.

The startup parameters will take effect when SQL Server is stopped and then re-started.

The -T1204 startup parameter collects information about the process and the resources when the deadlock detection algorithm encounters a deadlock. The -T3605 startup parameter writes this information to the SQL Server error logs.

The -T1205 startup parameter collects information every time that the deadlock algorithm checks for a deadlock, not when a deadlock is encountered. You do not have to use the -T1205 startup parameter.

If you do use the -T1205 startup parameter, the following is a sample of the output that will be in the SQL Server error log:
2003-05-14 11:46:26.76 spid4     Starting deadlock search 1
2003-05-14 11:46:26.76 spid4     Target Resource Owner:
2003-05-14 11:46:26.76 spid4      ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec:(0x43CAB580) Value:0x42bdf340
2003-05-14 11:46:26.76 spid4      Node:1       ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec:(0x43CAB580) Value:0x42bdf340
2003-05-14 11:46:26.76 spid4     
2003-05-14 11:46:26.76 spid4     End deadlock search 1 ... a deadlock was not found.
2003-05-14 11:46:26.76 spid4     ----------------------------------
2003-05-14 11:46:31.76 spid4     ----------------------------------
2003-05-14 11:46:31.76 spid4     Starting deadlock search 2


Sometimes, you might not be able to stop and re-start SQL Server. In that case, you can use Query Analyzer to run the following command to enable the deadlock trace flags.

Note This way you can gather information about the deadlocks immediately. The "-1" indicates all SPIDs.
dbcc traceon (1204, 3605, -1)
go
dbcc tracestatus(-1)
go

Step 2

Next, you must collect a SQL Profiler trace. If you turn on the deadlock trace flag, you will get most of the required information, but not always. For example, in a case study the trace flag output identified that a sp_cursoropen system stored procedure and an "UPDATE tblQueuedEvents set notifyid = 3, ResynchDate" statement were involved in a deadlock. Unfortunately, you do not know the definition of the sp_cursoropen system stored procedure. You also do not have the complete UPDATE statement because it was truncated.

SQL Profiler can obtain the full statements in addition to the execution plans of the statements. A SQL Profiler trace also has a lock event for "deadlock" and for "deadlock chain." "Deadlock" corresponds to the -T1204 flag, and "deadlock chain" corresponds to the -T1205 flag. Turning on the deadlock trace flags and running a SQL Profiler trace during the occurrence of a deadlock should provide you the data that you must have to troubleshoot a deadlock. In this case, and in others, running SQL Profiler changes the timing of execution enough to prevent the deadlock. Therefore, you will typically capture the deadlock information with the trace flags, and then you run SQL Profiler.

Troubleshooting a deadlock

After a deadlock occurs, you can gather information about the deadlock by using the sqldiag utility and by using SQL Profiler. In the output of the SQLDiag.txt file, look for a "Wait-for-graph" entry. A "Wait-for graph" entry indicates that a deadlock was encountered.

The following is a sample of the output that you might see in the SQL Server error log when you use the -T1205 startup parameter.
2003-05-05 15:11:50.80 spid4    Wait-for graph
2003-05-05 15:11:50.80 spid4    Node:1
2003-05-05 15:11:50.80 spid4    ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec:(0x33AE1538) Value:0x193
2003-05-05 15:11:50.80 spid4    Victim Resource Owner:
2003-05-05 15:11:50.80 spid4    ResType:LockOwner Stype:'OR' Mode: X SPID:60 ECID:0 Ec:(0x1F1BB5B0) Value:0x193
2003-05-05 15:11:50.80 spid4    Requested By: 
2003-05-05 15:11:50.80 spid4    Input Buf: RPC Event: sp_cursoropen;1
2003-05-05 15:11:50.80 spid4    SPID: 55 ECID: 0 Statement Type: EXECUTE Line #: 1
2003-05-05 15:11:50.80 spid4    Owner:0x1937f2a0 Mode: S        Flg:0x0 Ref:1 Life:00000000 SPID:55 ECID:0
2003-05-05 15:11:50.80 spid4    Grant List 0::
2003-05-05 15:11:50.80 spid4    KEY: 8:1653632984:2 (da00ce043a9e) CleanCnt:1 Mode: U Fl ags: 0x0
2003-05-05 15:11:50.80 spid4    Node:2
2003-05-05 15:11:50.80 spid4    ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec:(0x33AE1538) Value:0x193
2003-05-05 15:11:50.80 spid4    Requested By: 
2003-05-05 15:11:50.80 spid4    Input Buf: Language Event: Update tblQueuedEvents Set NotifyID = 2, ResynchDate
2003-05-05 15:11:50.80 spid4    SPID: 60 ECID: 0 Statement Type: UPDATE Line #: 1
2003-05-05 15:11:50.80 spid4    Owner:0x1936e420 Mode: X        Flg:0x0 Ref:0 Life:02000000 SPID:60 ECID:0
2003-05-05 15:11:50.80 spid4    Grant List 0::
2003-05-05 15:11:50.80 spid4    KEY: 8:1653632984:1 (2d018af70d80) CleanCnt:1 Mode: X Flags: 0x0


In the "Wait-for-graph" entry, you have Node 1 and Node 2. In each node, you have a grant section and a request section. The grant section is the "Grant List", and the request section is the "Request By."
In each node, you can identify the following:
The SPID.
The command the SPID was executing.
The resource.
The lock mode on the resource.

For example, in Node 1, the Grant List, SPID 55 had been granted an update lock, Mode: U, on resource KEY: 8:1653632984:2. 8=DBID, 1653632984=ObjectID, and 2=Indid. To obtain the database identification number, run the sp_helpdb stored procedure. To obtain the table, run the following code:
select * from sysobjects where id = 1653632984


To obtain the index, run the following code:
select * from sysindexes where indid = 2 and id = 1653632984

If IndexId is equal to 2, you know the index is a nonclustered index. The command that SPID 55 was executing was the sp_cursoropen stored procedure.

In Node 2, the Grant List, SPID 60 has been granted an exclusive lock, Mode: X, on resource KEY: 8:1653632984:1. 8=DBID, 1653632984=ObjectID, 1=Indid. This is on the same table but index 1 is the clustered index. The command that SPID 60 was executing was:
Update tblQueuedEvents Set NotifyID = 2, ResynchDate

An IndexId that is equal to 1 is a clustered index.

An IndexId that is equal to 2 is a nonclustered index.

Note Deadlocks are very time sensitive.

Next, in Node 1, Request By, SPID 55 requested a shared lock, Mode: S, on IndexId=1. In Node 2, Request By, SPID 60 requested an exclusive lock, Mode: X, on IndexId=2. Because these lock requests occur at the same time, the deadlock occurs. Each SPID’s granted locks are preventing the requested locks from continuing.

The following table shows the lock compatibility chart. For more information about lock compatibility, see the "Lock Compatibility" topic in SQL Server 2000 Books Online.

Lock compatibility chart
Requested mode
IS

S

U

IX

SIX

X
Intent shared (IS)
Yes

Yes

Yes

Yes

Yes

No
Shared (S)
Yes

Yes

Yes

No

No

No
Update (U)
Yes

Yes

No

No

No

No
Intent exclusive (IX)
Yes

No

No

Yes

No

No
Shared with intent exclusive (SIX)
Yes

No

No

No

No

No
Exclusive (X)
No

No

No

No

No

No


Next, by looking at the output, you identify ObjectId 1653632984 as the tblQueuedEvents table, and you obtain a sp_help stored procedure output for the table. There were two indexes on the table. The two indexes were ix_tblQueuedEvents and PK_tblQueuedEvent. ix_tblQueuedEvents is a clustered index on ResynchDate, and PK_tblQueuedEvent is a primary key, unique nonclustered index on EventSID.

The SQL Profiler trace was not able to capture the deadlock occurrence. Remember, deadlocks are very time dependent. The overhead of SQL Profiler probably added some time to the execution of one of the processes and that prevented SQL Profiler from getting in a deadlock situation. However, it did provide information that you can use to troubleshoot the issue. You found the full update tblQueuedEvents statement to be similar to the following:
Update tblQueuedEvents Set NotifyID = 2, ResynchDate = '5/7/2003 10:44:16' where eventSID = 73023
You also found the execution plan. You still do not have the full sp_cursoropen stored procedure statement, but you do have enough information to recommend a solution that will resolve the deadlock.

Here is the execution plan.

Note This particular execution plan is read right to left and bottom to top.
StmtText                                                      
                                                                                                          
                                  
                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
                               
Update tblQueuedEvents Set NotifyID = 2, ResynchDate = '5/7/2003 10:44:16'
                               where eventSID = 73023                                                     
                                  
                                                   
|--Clustered Index
                               Update(OBJECT:([SOTS].[dbo].[tblQueuedEvents].[ix_tblQueuedEvents ]),
                               SET:([tblQueuedEvents].[NotifyID]=[@1],
                               [tblQueuedEvents].[ResynchDate]=[Expr1004]))  
     |--Top(1)                                                                 
                                                                                                          
                                  
                                              
           |--Compute Scalar(DEFINE:([Expr1004]=Convert([@2])))                
                                                                                                          
                                  
                                               
                 |--Index
                               Seek(OBJECT:([SOTS].[dbo].[tblQueuedEvents].[PK_tblQueuedEvents]),
                               SEEK:([tblQueuedEvents].[EventSID]=[@3]) 

Recommend a solution to resolve the deadlock

Note that the UPDATE statement is performing a "clustered index update" on the clustered index. Therefore, the nonclustered index and the clustered index must both be updated. The clustered index is ix_tblQueuedEvents and the nonclustered index is PK_tblQueuedEvents. To perform the updates, the UPDATE statement must obtain exclusive locks on both indexes. These two indexes are the indexes that are involved in the deadlock. From reviewing SQL Profiler traces, you did not see any queries that used the ResynchDate in the WHERE clause. All the statements were very specific, and they used the EventSID in the WHERE clause. A better choice of a clustered index would be EventSID. With this information and a discussion with the customer, we found that the ResynchDate index was old, and it was not necessary. We recommended that the customer drop the ix_tblQueuedEvents index on ResynchDate, and that they make PK_tblQueuedEvent a clustered index. This resolved the deadlock situation.

This is only one example of a deadlock case that involves locks. Deadlocks can also involve parallelism and involve threads. They can involve one, two, three, or more SPIDs, and resources. With any deadlock case, you must obtain the –T1204 startup parameter output, and the SQL Profiler trace to identify, to troubleshoot, and to resolve the deadlock. Your deadlock situation will involve different processes and resources. Therefore, solutions will vary from case to case. Typical methods you can use to resolve deadlocks include:
Adding and dropping indexes.
Adding index hints.
Modifying the application to access resources in a similar pattern.
Removing activity from the transaction like triggers. By default, triggers are transactional.
Keeping transactions as short as possible.

No comments:

Post a Comment