Upon encountering stuck transactions in the SQL server (2014) DB, we had to perform some intensive in-depth analysis to uncover the source of the problem.
Since most of the transactions were NHibernate transactions, it was very difficult to find the origin of the problem; the real challenge was to find the problematic code that opened the transactions and to understand why they were not committed or rolled back.
To get to the root of the issue, we decided to take two different approaches:
We eventually managed to trace a set of DB transactions that got stuck, and armed with this information, we soon uncovered the code flow. We also added logs to help us dig out that problematic code flow. Since there were many nested functions, we added logs like a binary search (which found the problematic section and from there we drilled down with new logs).
Our never-ending efforts to dig out the exact cause for the stuck transactions even led us to invite a Microsoft expert in to analyze the code flow.
Thanks to some pretty intensive analysis, we finally found a problematic line of code where the code was stuck (due to no exception). It was actually a Microsoft.mshtml library that worked in a single thread manner (with a COM object). When one of the threads got stuck (because of a problematic HTML string input), the whole system got stuck (single thread object).
The solution was to replace the library with a newer, multi-threaded library, namely HtmlAgilityPack. With the new library, we’re able to ensure that if one thread becomes problematic, it will not affect the other threads. In addition, this is a newer library that provides much more flexibility and validations that can work with HTML inputs.
Rafi Daskalo is a director of R&D at Galil Software.