Orphaned MSDTC transactions (-2 spids)

Page content

SPIDS with a value of -2

Any dba looking after a server that takes part in a distributed transaction will probably have come across situations where a SPID that does not show up in master..sysprocesses or sys.dm_exec_sessions (depending on your version of SQL Server) ends up blocking other users. Looking at sp_lock or the syslockinfo table or sys.dm_tran_locks would show resources being locked by a SPID with a value of -2. SPIDs with a negative number cannot be killed in SQL Server.

Often, especially in older (pre version 7) versions of SQL Server the only option was to restart SQL Server to clear those locks. It’s partly because of this (and also because Sybase still requires a restart for orphaned transactions) that the myth persists that SQL Server needs to be restarted.

SPIDs with a value of -2 are orphaned distributed transactions.

What’s an orphaned distributed transaction?

In a nutshell, it’s a distributed transaction for which the transactional state is unknown. A distributed transaction is a database transaction that involves more than one database system (usually) located on different servers.

In order to maintain transactional consistency these transactions need to be coordinated, as we can’t have one database involved in the transaction committing the data it has been sent, whilst another database fails to commit the data it has been sent; either all the data is successfully committed by all the databases involved in the transaction, or the entire transaction is rolled back and no data gets committed by any database.

The process responsible for coordinating these transactions is the Microsoft Distributed Transaction Coordinator, or MSDTC.

You can get to it via Start > Run and typing dcomcnfg.

This brings up the Component Services plugin. Click on Component Services > Computers > My Computer.

Under My Computer, click on the Distributed Transaction Coordinator folder. Under Local DTC you will find the following information about MSDTC transactions:

  • Transaction List lists active distributed transactions
  • Transaction Statistics reports summary statistics for previous MDTC transactions

Both items provide useful information, but in the context of this blog, it’s the Transaction List that we’re interested in. This lists all the currently active transactions that have not been successfully completed, or been successfully rolled back. The status column lists what MSDTC regards as the state of the transaction, and the Unit of Work ID column with those strange GUID type values displays a unique identifier that is assigned to that transaction; this is a very useful value to have as it will allow any MSDTC transactions executing within SQL Server, including transactions that MSDTC has lost control of and thus have a SPID value of -2 to be killed by specifying the Unit of Work (UoW) ID in place of the SPID number.

If the status of the transaction is ‘Active’ then the transaction is currently executing. Figure 1 below, shows an active distributed transaction when viewed in the MSDTC console (the screenshot was taken from a Windows Vista host):

Fig 1: The MSDTC console

The problems arises when, for whatever reason, MSDTC loses track of one of these transactions. The transaction will then be marked as in-doubt. If this happens, SQL Server will honour the locks the SPID bound to that transaction has left, but the SPID itself will be assigned a negative value of -2.

This means it will no longer appear as an internal process, but will still cause blocking if the original transaction was accessing tables at the time it went awol, as the locks will have been retained. This is because SQL Server does not know whether it can commit or roll back the transaction as MSDTC has control of distributed transactions.

Sometimes this can happen because the MSDTC service (Distributed Transaction Coordinator) has been terminated, or is in a hung state. When MSDTC is restarted it will go through its own internal transaction log and try and regain control of transactions that were left open in order to commit them or roll them back. Usually, however, it’s because one or more of the applications involved in the distributed transaction has mis-behaved in some way and has hung or crashed, and the only option there is to try and restart the application(s) concerned and hope that it has a recovery mechanism to deal with any MSDTC transactions that were left unresolved.

From time to time, therefore, MSDTC will not be able to regain control of a transaction and it’s status will be marked as ‘in-doubt’ in MSDTC coordinator.

Once upon a time (well, before SQL Server 6.5 SP5 at any rate) the only way to resolve this was to restart SQL Server, and because this was once the only way to deal with orphaned SPIDs, it’s become something of a myth that it remains the only way to deal with the issue, as a mechanism to kill distributed transactions was incorporated into SQL Server in SP5 of SQL Server 6.5: http://support.microsoft.com/default.aspx/kb/195542.

The Unit of Work (UoW) ID

The UoW ID is a 24 character GUID assigned to each transaction issued by MSDTC, and it is the UoW ID that is used to identify and kill orphaned MSDTC transactions in SQL Server.

Killing an MSDTC transaction

So, you’ve come up against an MSDTC transaction that the MSDTC service has been unable to successfully commit or rollback.

This transaction has now left locks on your data that are associated with an orphaned SPID and you’ve got no option but to kill that MSDTC transaction.

You first need to retrieve the UoW ID. This should be displayed in the MSDTC console, but to verify it is the correct ID, you can interrogate the syslockinfo system view. Specifically the req_transactionUoW column. The req_spid column in syslockinfo hosts the SPID number. So, to pull out the UoW ID for an orphaned MSDTC transaction you just need to run the following query:

select req_transactionUoW as [UoW ID] from syslockinfo where req_spid = -2

The above will work in SQL Server 2000 onward, but from SQL Server 2005, it’s more politically correct to use the DMVs, so the table and column names have changed slightly:

select request_owner_guid as [UoW ID] from sys.dm_tran_locks where request_session_id = -2

Then to kill it:

kill {'UoW ID'}

where {UoW ID} is the result of the above select statement.

However, this must only be done as an absolute last resort when other methods, such as e.g. restarting the applications/processes involved in that distributed transaction do not resolve the problem. At the end of the day, those locks will have been placed for a reason, so you don’t know if the consistency of the data is now compromised especially if syslockinfo/sys.dm_tran_locks shows the MSDTC transaction(s) concerned have placed exclusive locks on objects

The default behaviour for handling in-doubt MSDTC transactions can be altered from the default of doing nothing and letting the DBA handle it, to either abort or commit the transaction automatically.

It is configured by specifying the server configuration option ‘in-doubt xact resolution’ as documented in BOL. Check the documentation on this carefully, particularly the note on ensure that all servers involved in the transaction have the same setting for this option.

[EDIT 2013-09-19: Read this cautionary tale before even thinking about terminating MSDTC transactions]

Troubleshooting Problems with MSDTC

New DTC Functionality in Windows Server 2003 Service Pack 1

Enable Firewall Exceptions for MS DTC