This past summer, when the vast majority of big application enhancements were in after our massive re-platforming project (SQL Server backend), we were able to get some of the Dev resources to start getting more attention on the day to day performance issues, including some frequent occurrences of deadlocks (a few more than I’d like to admit!) and some long running transactions. While the application had to logic to retry the transaction and processing would continue, this is not something you want to have as a regular occurrence.
This has been an ongoing effort between dev & DBA, and we’ve used several different tools & methods to improve overall system performance:
- Adding indexes
- Adding indexes to temp tables within stored procedures
- Converting large Entity Framework queries to stored procedures
- Implementing a caching solution (Redis) for a portion of the application
- Rewriting Stored Procedure & View logic to reflect usage patterns in the application.
With all the tuning we’ve done, we still have one stored procedure which is occasionally involved in deadlocks. This stored procedure itself has had a lot of care and feeding, where I not only was able to leverage indexes on temp tables for performance but an application logic rewrite, which resulted in a 95% decrease in average time for this stored procedure. Deadlocks reduced further, but still occurs when we hit a particular race condition. This has become my white whale.
I was again inspired by Kendra Little’s Troubleshooting Blocking and Deadlocks for Beginners course on https://sqlworkbooks.com/ (and even if you are not a beginner, I think she has a great way of looking at concepts that at least for me gives me a different perspective on approaching problems). An example she gives about deadlocks is how it’s like two trucks going towards each other on a narrow road, there isn’t room to pass. In her example, she created an alternative route for a read query to traverse that was being blocked by an update.
The trick on my deadlock was that it involved the primary key and the clustered key – which on this table happened to be two different keys. My victim was consistently using the Primary key for the lookup but then needed to updated the Clustered Key. My killer was using the Clustered Key for the lookup & update. So my thought was can I tell my stored procedure to use the same path as the killer based on my knowledge of my data setup? Share the road and drive in the same order? (maybe it’s more like the circle driveway like you have at a hotel, where everyone should enter & exit the same way).
To visualize, here is an example table “OrderItems” to illustrate. The yellow column, ItemId, is my Primary Key. The red columns, OrderID & ItemNumber are my Clustered key, on which the table is organized. In SQL Server, Clustered indexes sort and store the data rows in the table based on their key values (see MSDN Clustered and Nonclustered Indexes Described for more details). Generally the Clustered Index and Primary Key would be the same but in our application, we are referring to the Items in the context of the Orders.
My “victim” example:
SET cost = peritemcost
FROM orderItems oi JOIN #temprollup tr on oi.itemid = tr.itemid;
My Killer example:
SET ItemNumber = @ItemNumber, Cost=@Cost, Taxable= @Taxable, Quantity=@Quantity, ...
WHERE ItemID = @ItemID;
Why is my “Killer” using the Clustered key index vs. the primary key for lookup? In this case, every column is potentially updated. My victim uses the Primary key for its lookup.
In testing, adding the Clustered Key first column prompted my stored procedure to use the Clustered key for the lookup & update.
My “victim” update (one option – another would have been to keep the OrderID column in the temp table and include on the join, but this was the quickest to illustrate)
SET cost = peritemcost
FROM orderItems oi JOIN #temprollup tr on oi.itemid = tr.itemid
WHERE OrderID = @OrderID;
While I need to do further testing & load testing, this looked to be a promising start! (It also looked like my stored procedure change reduced that step from ~13 ms to ~5 ms, removed a work table and reduced my logical reads from 634 to 588 on my “OrderItems” table on my test set).