Three leading Java developers, Ron Goldman, Tony Printezis, and Kirk Pepperdine, share their insights about garbage collection. Part of the Developer Insight Series.
Over the years, I've heard developers talk about their favorite code, funniest code, most beautiful code, how to write code, how not to write code, the obstacles to writing good code, what they love and hate about writing code, and so on. In the process, I've encountered many insights worth sharing.
Parts One and Two of this series provided advice on how to write good code. In Part Three, developers reflected on the actual process of writing code, how it happens, what it feels like, and how they do it. In Part Four developers shared their favorite and funniest code and code stories. In Part Five, three Java Champions, individuals who have received special recognition from Java developers across industry, academia, Java User Groups (JUGs), and the larger community, discuss cloud computing and Java technology.
Here, in Part Six, we look at the challenging issue of garbage collection (GC) from three different, but complementary, perspectives.
Ron Goldman is a Senior Researcher at Oracle Labs who works on new software architectures for distributed systems. He is currently a member of the Sun SPOT project(http://www.sunspotworld.com/), which investigates the use of Java on small embedded, wireless devices. He was instrumental in defining the vision and details for the java.net Website. He has advised various open source projects, including NetBeans, OpenOffice, and Jini. He is the co-author of the book Innovation Happens Elsewhere: Open Source as Business Strategy, published in April 2005 by Morgan Kaufmann. Goldman received his Ph.D. in computer science from Stanford University, where he was a member of the robotics group, in 1983.
A lot of studies show that even well-tested code written by experts will have a significant number of bugs. It's foolish to write a program or system assuming that nothing will go wrong. Errors are unavoidable due to both bugs in the implementation and unexpected inputs from the environment. It is impossible to thoroughly test for unplanned interactions in which changes in one piece of code affect quite distant, seemingly unrelated pieces of code. Errors might manifest only when a particular combination of logic paths is executed.
Rather than putting all the programming effort into trying to prevent errors from occurring, we feel it is better to devote runtime resources to detecting that an error has occurred and recovering from it. Let me give a historical example: GC, or automatic memory management.
It's helpful to look at where GC came from. When John McCarthy was designing the LISP language, one of the programs he wrote was an elegant algorithm to do symbolic differentiation. He recognized that the code would be using up memory and if it wasn't released, memory would eventually run out. And he deliberately decided that he didn't want to mess up his elegant algorithm for differentiation with a lot of record keeping and bookkeeping for memory, which had nothing to do with the problem he really cared about. So he did something that we're considering doing in a number of other places. He accepted the idea that all programs have bugs and created a system that can repair and clean up unused memory and that, in a sense, can recycle it and make it available.
Making memory management a separate process independent of the application gives you a much cleaner, simpler-to-write program, and it is also easier to focus on good memory management. Repairing and acknowledging the problem makes the system more robust. If we're serious about wanting robustness, reliability, and security in our programs, we need to devote resources to them, resources above and beyond what the application uses.
Although automatic memory management has existed for more than 50 years, a lot of people still don't want it in their systems because it seems inelegant. It just strikes people as wrong -- "It's my memory. I should be releasing it when I know it's no longer being used." Theoretically, that might be true, but in practice, programmers continue to forget to free up memory when they are done with it, or, even worse, try to free it up while it's still in use. The results? Buggy code that is apt to crash unexpectedly.
Why force programmers to perform tasks for which they are not suited? Why not instead have the computer take over as many of those tedious chores as we can? Our software should be an active participant in maintaining its integrity. Our programming languages can deal with low-level concerns such as checking array bounds and preventing buffer-overflow attacks. For higher-level matters, such as maintaining system-wide constraints and providing a specified quality of service, we need new mechanisms.
Read the full interview with Ron Goldman.
Tony Printezis is a Principal Member of Technical Staff at Oracle, based in Burlington, MA. He has been contributing to the Java HotSpot Virtual Machine since 2006. He spends most of his time working on dynamic memory management for the Java platform, concentrating on performance, scalability, responsiveness, parallelism, and visualization of garbage collectors. He obtained a Ph.D. in 2000 and a BSc (Hons) in 1995, both from the University of Glasgow in Scotland. In addition, he is a JavaOne Rock Star, a title awarded for his highly rated JavaOne session on GC.
GC is headed towards bigger heaps, better latencies, and much more garbage! Here are eight myths about GC that I have spent a fair amount of time busting:
1. Reference counting GC will solve all my latency problems.
2. malloc and free will always perform better than GC.
3. Finalizers should be called promptly, as soon as objects become unreachable.
4. GC will eliminate all memory leaks.
5. Life would be so much better if I could explicitly deallocate some important objects, since I know when they're unreachable.
6. I can get GC that delivers both very high throughput and very low latency.
7. I need to disable GC in critical sections of my code.
8. Anything I can write in a system with GC, I can write with malloc and free.
A Java Champion since September 2005, Kirk Pepperdine is a primary contributor and consultant to javaperformancetuning.com, which is widely regarded as the premier site for information about Java performance tuning, and the co-author of Ant Developer's Handbook. He has been heavily involved in application performance since the beginning of his programming career and has tuned applications in a variety of languages: Cray Assembler, C, Smalltalk, and, since 1996, the Java programming language. He is currently an independent consultant and an advisor to theserverside.com.
A while ago, I added a slide to my presentations that says, "Everything I'm about to tell you will be wrong." I say this because, as time marches on, tips grow stale and things need to be reassessed. What’s even scarier, some tips are just plain wrong to begin with.
Think of it this way: The prescription (drug) that your friend is taking might result in ill health if you were to take it. The same goes for performance tips.
Here's an example: I was asked to review a paper on what you could do in your code to help the garbage collector. The big tip in the article was that you should null out references as soon as you stop needing them. Littering your code with myObject = null statements just seemed wrong. I would contend that if you can null out an object with no ill effects, the object is improperly scoped or scoped too broadly. A better solution would be to narrow the scope of the object so that it goes away when the value is no longer needed. This is a case where clever code to help the garbage collector is really a code smell.
...Even though collection of very short-lived objects is almost free, high rates of object churn can still result in very inefficient GC numbers. Sometimes, the problem is simply that the Java Virtual Machine (JVM) doesn't have enough heap space. Monitoring GC activity will give you the hints you need for a successful heap-sizing exercise. Other times, you'll have to perform some object creation profiling to identify the source of churn and deal with it in application code.
HPJMeter is a free tool that will read verbose GC logs and provide you with that measurement. Tagtraum also has a tool that will read the logs and calculate GC efficiency. I've often found that just playing with heap-space sizings can make a huge difference in GC efficiency, application response times, and throughput.
I guess I should plug another GC log viewer, gchisto. Tony Printezis, the creator, has allowed me to lend him a hand in adding functionality. It has been integrated into VisualVM. I've also been experimenting with my own tool, which I've called Censum. It is not quite ready for release, but it will be showing up soon. I've been asking people to send me their GC logs so that I might further test with real data. In return, I've been offering a reading of the GC logs (or tea leaves, if you prefer). So far I've gotten a few takers, some from some pretty famous sites, which was quite exciting. I was just at NetBeans days in Munich and the night before my presentation I was goaded into integrating Censum into VisualVM, which I managed to throw together and pull off in a demo. The integration is somewhat limited in that it works only with JVMs running on the same machine, but it doesn't look like it would take much to turn it from demoware into something more robust. It will be an interesting summer project.