Sunday, May 8, 2011

Contextual Thread dumps

Due to some business policy changes we recently started seeing some changes in usage pattern of our application leading to unexplained app node spikes. These spikes were temporary and by the time we go and try to take jstacks it might have disappeared. So we configured a quartz job to take jstack every 5 min(wrote a quartz instead of cron because cron needs to be manually configured on each node and we have tons of nodes to ops was always missing or misconfiguring it) and dump it in to a folder and we keep last 500 copies. That way I can go and correlate what was going on in the tomcat during the time of the spike (I had to get lucky for spike to happen when quartz job was running but I was lucky as most spikes spanned 3-5 mins). Now from those thread dumps I can figure out what was going on like how many thread are doing "searches" v/s how many thread are coming from Webdav or how many threads are doing add file. But one question that keep on coming was who are the customers who are doing them. For e.g. if we saw 50 threads doing webdav propfinds, it would be good to know if most of these requests are coming from same customer or diff customer. So I went and added customer domain name to each thread in a servlet filter for each incoming request. This helped me find issues at a much faster rate as I no longer need to go and correlate it in the logs as to what thread was doing at that time. I found lots of patterns by just adding contextual information to threads. Below is how the information is in the thread name.


   at java.lang.Thread.dumpThreads(Native Method)
   at java.lang.Thread.getAllStackTraces(
   at com.sslinc.infrastructure.perf.ThreadDumpBean.(
   at org.apache.jsp.static_.admin.threadDump_jsp._jspService(
   at org.apache.jasper.runtime.HttpJspBase.service(
   at org.apache.tomcat.util.threads.ThreadPool$


   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(
   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(
   at java.util.concurrent.LinkedBlockingQueue.take(

No comments:

Post a Comment