|By Tomas Hurka and Jaroslav Bachorik (NetBeans Profiler engineers), May 2006|
Tomas Hurka : How would you describe your current role in the Java Performance team?
Currently, I'm the lead of the performance team working on GlassFish, our open source application server. I oversee all the team efforts on performance, and then I get to dive into areas as they come up. So I've been spending a lot of time on the performance of Grizzly, the NIO HTTP connector, which we've managed to make just as performant (in fact, slightly better) than our previous C-based HTTP connector. It's great to see a result like that; the bias that C will always be faster than the Java programming language is still out there, and this is an area in particular where it's been almost universally accepted that Java wouldn't cut it for performance. But it has (and really, it's all due to the superhuman effort on that task by Jeanfrancois Arcand, the architect of Grizzly).
Jaroslav Bachorik : You've been involved in quite a few projects at Sun. Which were the most interesting?
Fifteen years ago, I was writing device drivers for our ISVs, and at the time, that was incredibly interesting. Now I'm totally infatuated by getting the performance of our free application server equal to that of appservers selling for US $10,000 per CPU. That's the cool thing about my job here; I've almost always been really interested by what I was doing. But I really loved the early days of Java technology, when I was in a group that would go around and give one-day seminars in the Java programming language. There was a lot for us to learn as well as teach, and it was great to see the pent-up frustration of all these C++ programmers just dissipate as they learned about Java.
Tomas Hurka : What are some of the most challenging issues you have had to solve so far?
Technically, or politically? Frankly, it's often a lot harder getting engineers (and their managers) to pay attention to performance and to accept the fact that performance is a feature. Features are fun to work on. Performance -- eh, we'll just get faster hardware.
"It's often a lot harder getting engineers (and their managers) to pay attention to performance and to accept the fact that performance is a feature."
GlassFish Performance Lead
Technically, the hardest issues always involve integrating unknown code into a project. Particularly for my group: we don't necessarily work on any one aspect of the appserver. But we have to find bottlenecks in all of it.
Jaroslav Bachorik : What are you currently working on?
Right now, we're looking at performance of the Enterprise JavaBeans (EJB) 3.0 reference implementation. It's really interesting; from a developer's perspective, it's easier to write EJB 3.0 beans (and standalone persistent entities) than EJB 2.1 session or entity beans. So we'd hope the resulting code would be faster too. But the annotation syntax hides a lot of the complexity of the resultant code. And the persistence model is very new to most of us (though of course, not to Hibernate or TopLink developers). So it's an area rich with possibilities and expectations.
Tomas Hurka : It's been said that every large scale project has performance-related issues. From your experience, what would you describe as common situations leading to performance problems?
Overcomplexity. We're all smart engineers, and we love to design things. And we're smart enough to overdesign them with all these cool features and bells and whistles. And that's almost always bad for performance.
"A simple design is easier to create, easier to implement, easier to maintain -- and faster to execute."
GlassFish Performance Lead
Here's an example. When we started work on Grizzly, we needed a thread pool. Because we had a requirement to support Java SE 1.4, we wrote our own thread pool. That's a simple enough task; the essence of a thread pool takes only a few 10s of lines of code. Then along comes Java SE 5, with its
Executor classes. They are amazing pieces of engineering; the
ThreadPoolExecutor uses four pieces of information in determining whether to hand of a task to a new thread or queue it for an existing thread; they support different data structures for queueing; they have four or five different lifecycle methods; and so on. It is the Rolls Royce of thread pools.
But we found when we substituted a
ThreadPoolExecutor for our simple thread pool implementation that we lost performance. Quite simply: when there's more code to execute, it will take more time.
I get a lot of pushback when I say that in some circles. "It's only a few microseconds; the compiler will optimize away method calls in overly-complex object hierarchies; general-purpose is better than special-purpose" and so on. And certainly there are tradeoffs to consider. But a simple design is easier to create, easier to implement, easier to maintain -- and faster to execute.
Tomas Hurka : Typically, what is the biggest challenge when trying to discover the source of performance degradations?
You always hope that you'll load an application into a profiler, it will show that there's a hotspot in the code accounting for 60% of the time, and you can just go fix that and call it a day. In fact, despite the fact that I've been working on the appserver codebase for a few years now, every few months someone will suggest to me just that: there must be an obvious bottleneck that we can go fix ("go fix the low-hanging fruit" is the most common suggestion I get).
If you're working on a new project, then yes: often you'll get lucky and it will work just that way. But in reality, bottlenecks are subtle. Leaf methods in a profile will all show very little CPU usage; root methods will all be too far removed from the source of the bottleneck. So you're left to walk up and down the stack and figure out where to look.
I'm often asked how to do this well, and it's not really something I can explain. I think with practice you just develop an intuition for it.
Tomas Hurka : From your viewpoint, what are the special circumstances of profiling an enterprise application deployed on an application server? Are there any significant differences compared to profiling a standalone application?
Well, the biggest special circumstance is getting to the point where profiling makes sense. Appserver performance is dependent on so many external factors; I'd say that I spend far more time getting the database to run optimally, or making sure that the appserver has the correct number of threads, or that the OS TCP parameters aren't my problem. If your database is the bottleneck, then all the profiling in the world isn't going to help.
But once you get to the point where the platform is well tuned, then there's the sheer size of the appserver. Plus of course the JDK and Virtual Machine for the Java platform (Java Virtual Machine or JVM) . So you have all this code, most of which you probably aren't even responsible for. And for Java EE developers, it's worse: they have no control over the appserver code (though GlassFish is open source now). So having correct filters in the profiled code is crucial.
Jaroslav Bachorik : On the big scale, where can you see opportunities for improvements in the field of profiling Java applications?
At a basic level, there are three big issues I have with Java profilers: first, they tend to be very intrusive. If you can reduce the number of classes you want profiled, then they are much better. But when you're first looking at a problem and you haven't narrowed down any information, limiting the classes can be self-defeating.
Second, Java profilers typically offer no visibility into the JVM. In particular, it's occasionally useful to know how much time the garbage-collection and compiler threads are using. Not always, but I've hit compiler bugs in the past where the compiler thread would get in an infinite loop. That wouldn't show up in a profiler.
Third, blocking methods -- writes to network sockets, poll calls, waiting for locks -- aren't traditionally handled very well by profilers. Especially waits for locks; lock-contention analysis is typically tricky with development tools right now.
Tomas Hurka : What kind of tools do you usually use to track down performance problems?
Everything I can get my hands on. I always try and take system statistics (iostat for disk usage, mpstat for cpu usage, jstat/visualgc for garbage collection (GC) usage, database usage statistics and so on) whenever I test things for performance. As I mentioned, we often find that performance problems for Java EE applications come from outside the application server, or at least from something like GC that can be tuned pretty easily.
We also gather statistics from the appserver itself: how it's managing its thread pools, the passivation and creation rate of EJBs, size of the MDB pools, and so on. We analyze all the monitoring data first to make sure that we haven't missed something.
But then we profile. On the Solaris Operating System, we profile with either NetBeans Profiler or the Sun Java Studio collector/analyzer (which is C based, so it offers some visibility into the JVM). Another interesting thing about profilers is that they will sometimes show different results, which is a usual artifact of sampling. So sometimes we have to try many different profilers before we get a good answer.
Jaroslav Bachorik : What was the primary reason for trying out the NetBeans Profiler?
Ah, a case in point. We were tracking down a particular regression in GlassFish. It occurred only on Windows (in the end, because of a difference in the native libraries the JVM uses on Windows versus other platforms), and for a long time, the only tool we had at our disposal was OptimizeIt. We spent months trying to delve into the OptimizeIt profiles looking for the issue, and we never found it. On Windows, that was our only choice.
"Love that you can switch profiling modes from just monitoring to profiling. That's really important for Java EE applications."
GlassFish Performance Lead
We'd tried the NetBeans profiler in beta, but there were some bugs with it in conjunction with the appserver. But then along came the RC builds, and we figured: why not? So we downloaded it, and within 15 minutes had the solution to a problem that had evaded us for months!
Since then, we've used it to track down a variety of performance bottlenecks.
Jaroslav Bachorik : Which features in the NetBeans Profiler do you view as the most valuable?
I love that you can switch profiling modes from just monitoring to profiling. That's really important for Java EE applications; there's all the startup time of the appserver, and usually some initial load time of the application which the server warms up its EJB caches and JDBC connections and everything. Profiling all that isn't interesting; it's great to just monitor and then switch to profiling mode when you're at the point you need to be.
"It's all about NetBeans' accuracy; it's just made us more productive as it's found bottlenecks that other tools have missed."
GlassFish Performance Lead
The displays where you walk up or down the call stack are particularly well-done, I think. I find them a lot more intuitive than other profilers. I particular like that you can filter for a method that is normally in the middle of the call stack and browser up to the thread root from there without worrying about that method's children.
Tomas Hurka : Compared to other profilers, is there something that makes the NetBeans Profiler the tool of choice?
It's all about NetBeans' accuracy; it's just made us more productive as it's found bottlenecks that other tools have missed.
I also really like that I can reset the profiling results and take multiple snapshots. So I can, in one single session, profile one set of operations, take a snapshot of those, reset the results, perform another set of operations, save that as a snapshot, and so on. That would take a lot more effort with other tools.
Tomas Hurka : And on the other hand, what are the areas where we should be doing better?
The memory leak detection capability is missing some features. It's great for detecting where too many objects are getting allocated, and that's often the most important problem anyway. But detecting a leak by tracking things through the survivor spaces is harder than it should be.
For some operations, the overhead of the NetBeans profiling is high. So far we've not been very successful in profiling the appserver remotely. Switching modes in the profiler is a feature I love, but right now it could be slow sometimes.
Tomas Hurka : What are the features from other profiling tools that you miss in the NetBeans Profiler?
Of course OptimizeIt has a feature where you can compare two snapshots of memory and look for leaks that way. Also, switching modes in NetBeans is good, but sometimes I just want to turn the application-level profiling on or off.
Tomas, Jaroslav : Scott, thank you for the interview and for the feedback -- we will make sure it gets addressed in next releases!