A rather interesting experience today at one of my customers. Over the weekend, they upgraded to SharePoint 2010 Service Pack 2. At first everything looked just fine, but then the helpdesk calls started coming in: SharePoint was slow. Now we all know that perceived performance is a very subjective thing, but once we logged on ourselves we noticed a very significant delay. After a few minutes, our site became totally unresponsive but the sites in another web application were just fine.
We logged on to the server only to find the CPU at 100% - all of the time. One particular application pool was consuming around 95% CPU. To find out which exact application pool we were dealing with, we issued the following command:
appcmd list wp
That gave us the application pool name and id. It was our SharePoint application pool like we suspected. The customer is using different application pools but each with the same process identity, so it took us a few minutes to double check we were targeting the right one.
Next we opened up Process Explorer and examined the problematic w3wp.exe. We noticed that 2 threads were consuming about 90% of all process resources but we couldn’t determine what exactly they were doing.
The odd thing was that this application pool was consuming the same amount of resources on every web server in our farm. So we were pretty sure it was not related to a specific server. A couple of IIS resets and application pool recycles later, we were ready to bring in the big guns.
DebugDiag to the rescue
One of the most powerful tools to research such a situation is DebugDiag. It allows you to inspect processes, create dumpfiles and analyze problems. Creating dumpfiles can be done manually, but also automatically in case of certain exceptions. One of the cooler things is that it is SharePoint aware. It even has SharePoint-specific analysis possibilities:
We fired up DebugDiag, dumped the w3wp process and looked at the output. This is what we found in the beginning of the generated report:
Notice how the first two threads consume the bulk of the CPU resources for this process.
Let’s take thread ID 51 as an example. A little further down, we can see which request was generating all this work: (Click to Enlarge)
It appears to be a request for an InfoPath form (displayifs.aspx). Let’s look at the stack trace for thread 51 at bit closer:
The stack trace clearly indicates InfoPath Forms Services again, so we were pretty sure we were onto something. We dumped and analyzed the process a few times, on different web servers only to come to the same conclusion: when someone uses an InfoPath form the application pool goes wild.
The first question you have to ask is: what did we change? In our case that change was clear: SharePoint 2010 Service Pack 2. After a lot of searching around we found the following support article:
Although it didn’t mention anything about bringing a complete application pool to a halt and it seemed only partly related to our problems, we decided to give it a try. We didn’t have to install anything because Service Pack 2 includes the mentioned fix. As per the article, we issued the following command:
$f = Get-SPInfoPathFormsService $f.Properties.Add("AllowEventPropagation", $false) $f.Update()
A bit to our surprise I must say, the CPU pressure went away almost immediately! Main lesson? Learn how to use DebugDiag!