|By Janice J. Heiss, April 2009|
Over the years, I've heard noted developers talk about their favorite code, funniest code, most beautiful code, how to write code, how not to write code, the obstacles to writing good code, what they love and hate about writing code, and so on. In Part One of the Developer Insight Series, three Java Champions responded favorably to Brian Goetz's advice to write "dumb" code.
In Part Two, we hear code advice from five distinguished developers: Joshua Bloch and Masood Mortazavi echo Goetz's advice to keep code simple. Jaron Lanier and Victoria Livschitz want to radically change the way code is created. And renowned bug fixer Brian Harry provides tips on bug fixing while emphasizing what the process can teach us.
|-||Joshua Bloch: Beware of Unwarranted Optimism|
|-||Masood Mortazavi: The Best Code Makes the Least Assumptions|
|-||Jaron Lanier: The Way We Think About Software Is Wrong|
|-||Victoria Livschitz: A Programming Language That Is a Notch Richer Than OO|
|-||Brian Harry: In Fixing Bugs, Look at Earlier Versions of the Platform|
Joshua Bloch, chief Java architect at Google, is well known as the author of Effective Java (now in its second edition) and Java Puzzlers (with Neal Gafter). In a former life as a senior staff engineer at Sun Microsystems and an architect in the core Java platform group, he designed and implemented the award-winning Java Collections Framework, worked on the
java.math package, and contributed to many other parts of the platform.
Bloch holds a Ph.D. in computer science from Carnegie Mellon University, and his book Effective Java is a much-loved classic among Java developers.
You have said that a common fault among Java developers is the natural tendency to optimize code, resulting in slower code that is needlessly complicated. Why do developers mistakenly optimize code?
To be a software developer, you have to be an optimist -- otherwise, it would feel like a losing battle. Generally, this is a good thing, but it has a downside: Optimism can lead to overconfidence. It's easy to feel like the general warnings about premature optimization don't apply to you, because you just know which code is time-critical and how to make it fast. But no one can determine this without measuring before and after each attempted optimization.
Another fault you refer to involves developers writing their own code when perfectly good libraries exist. Why do developers do this?
"In order to stay sane, most developers have a can-do attitude, and some take it too far. They say to themselves, 'Yes, there's a library, but I can do better.' Maybe you can, but that doesn't mean you should. Use the standard library unless it's profoundly unsuited to your needs."
Chief Java Architect, Google
Two reasons. By far the most common reason is that the developer doesn't know the library exists. I feel for that developer, because there are so many libraries out there that it's impossible to keep track of them all. That said, it's so hard to get some of these facilities right that it's worth making the effort to find out if a library exists. This is particularly true where concurrency is involved. It's not uncommon for experts to spend literally months writing apparently modest concurrency utilities.
When faced with a need for this sort of functionality, the wise developer will do whatever it takes to find an appropriate library. It's so easy to get nontrivial concurrent code wrong, and the resulting bugs can be nearly impossible to detect.
The second reason that developers tend to reinvent the wheel is the same reason they tend to optimize prematurely: In order to stay sane, most developers have a can-do attitude, and some take it too far. They say to themselves, "Yes, there's a library, but I can do better." Maybe you can, but that doesn't mean you should. Use the standard library unless it's profoundly unsuited to your needs.
Is there a style of thinking that gets developers into trouble?
I guess the style of thinking that most often gets developers into trouble is unwarranted optimism. There's a natural tendency to expect your program to just do the right thing, but the machine can't read your mind. Most of the facilities provided by the system have certain limitations you should be aware of. For example, an
int isn't the same thing as an
integer -- there are infinitely many integers, but only 2 32
int values. That means you have to worry about overflow. For example, you might think that this loop iterates over each
int value exactly once:
for (int i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) doSomething(i);
It doesn't. It's an infinite loop, because every
int value is less than or equal to
Integer.MAX_VALUE. Once the loop gets to
Integer.MAX_VALUE, it wraps around to
Integer.MIN_VALUE and starts over again. That's just one simple example of this style of thinking, but it can be applied to every aspect of the language and its libraries with equally unpleasant results.
Masood Mortazavi, a software engineering manager at Sun, started his work with the company on the Java EE development team and as an expert in continuous availability issues and telecom collaborations with the Java software group. In recent years, he's managed teams of engineers who contribute to open-source databases such as Apache Derby, PostgreSQL, and MySQL.
"It often feels like I'm writing some kind of legal argument as we create and encode system behavior. Here, the best code makes the least and only the absolutely necessary assumptions, if any."
Software Engineering Manager, Sun Microsystems
He has a B.S. (U.C. San Diego) and an M.S. (U.C. Davis) in applied and chemical engineering, and a Ph.D. in computational fluid dynamics (U.C. Davis). In addition, he has a master's degree in journalism (U.C. Berkeley) and an M.B.A. (U.C. Berkeley), and he spent several years pursuing a second Ph.D. in the Graduate Group in Logic and Methodology of Science (math, philosophy, and computer science) at U.C. Berkeley, with a focus on foundations of math, theories of computation, and philosophy.
I've often had to focus on logical structures, specific algorithms, and program dynamics when writing code. It often feels like I'm writing some kind of legal argument as we create and encode system behavior. Here, the best code makes the least and only the absolutely necessary assumptions, if any.
For the logician, thinking of programs as a structure of predicates with bound and free variables can be very helpful. What's called a good "method" in Java, or "function" in other, traditional programming languages, is one that avoids local variables, which only serve for internal bookkeeping. The more internal bookkeeping we do, the more verbose our program becomes, which means the code will begin demanding that we break it up into more classes and methods.
Jaron Lanier is well known for his work on virtual reality, a term that he coined in the 1980s. Renowned as a composer, musician, and artist, he has taught at many university computer science departments around the United States, including Yale, Dartmouth, Columbia, and Penn. He served as the lead scientist for the National Tele-Immersion Initiative, which is devoted, among other things, to using computers to enable people in different cities to experience the illusion that they are physically together. From 2001 to 2004, he was visiting scientist at Silicon Graphics Inc., where he developed solutions to core problems in telepresence and tele-immersion. Lanier received an honorary doctorate from New Jersey Institute of Technology in 2006, was the recipient of Carnegie Mellon University's Watson award in 2001, and was a finalist for the first Edge of Computation Award in 2005.
He has most recently been working on what he calls "phenotropic computing," in which the current model of software as protocol adherence is replaced by pattern recognition as a way of connecting components of software systems. He is currently the interdisciplinary scholar-in-residence at the Center for Entrepreneurship & Technology, U.C. Berkeley.
"The whole way we write and think about software is wrong."
Virtual Reality Pioneer
I think the whole way we write and think about software is wrong. If you look at how things work right now, it's strange: Nobody -- and I mean nobody -- can really create big programs in a reliable way. If we don't find a different way of thinking about and creating software, we will not be writing programs bigger than about 20 to 30 million lines of code, no matter how fast our processors become.
This current lack of scalability is a universal burden. There are monopolies in our industry because it's so difficult for anyone to even enter the competition; it's so hard to write large software applications. And that's strange to me. If you look at other things that people build, like oil refineries or commercial aircraft, we can deal with complexity much more effectively than we can with software.
The problem with software is that we've never learned how to control the side effects of choices, which we call bugs. We shouldn't be complacent about that. I still believe that there are ideas waiting to be created and that someday we will have new ways of writing software that will overcome these problems. And that's my principal professional interest. I want to make a contribution to making bugs go away.
Aren't bugs just a limitation of human minds?
No, no, they're not. What's the difference between a bug and a variation or an imperfection? If you think about it, if you make a small change to a program, it can result in an enormous change in what the program does. If nature worked that way, the universe would crash all the time. Certainly there wouldn't be any evolution or life. There's something about the way complexity builds up in nature so that if you have a small change, it results in sufficiently small results; it's possible to have incremental evolution.
Right now, we have a little bit -- not total -- but a little bit of linearity in the connection between genotype and phenotype, if you want to speak in those terms. But in software, there's a chaotic relationship between the source code (the "genotype") and the observed effects of programs -- what you might call the "phenotype" of a program.
And that chaos is really what gets us. I don't know if I'll ever have a good idea about how to fix that. I'm working on some things, but you know, what most concerns me is what amounts to a lack of faith among programmers that the problem can even be addressed. There's been a sort of slumping into complacency over the last couple of decades. More and more, as new generations of programmers come up, there's an acceptance that this is the way things are and will always be. Perhaps that's true. Perhaps there's no avoiding it, but that's not a given. To me, this complacency about bugs is a dark cloud over programming.
Prior to founding Grid Dynamics in 2006, she served for 10 years as a principal technologist at Sun Microsystems, where she held a number of senior positions, including chief architect for major accounts. During her tenure at Sun, she also conducted original research into next-generation computer languages and designed a new programming language, Metaphors. She started her career at Ford in 1994 as a high-performance computing engineer.
Livschitz is a winner of several prestigious awards for engineering excellence, including Ford's Chairman Award and Sun's System Engineer of the Year Award. She has multiple patents in grid technology.
She holds a B.S. degree in computer science from Case Western Reserve University. She also attended Vilnius University, Kharkov State University, Purdue University, and Stanford University, where she studied applied mathematics, electrical engineering, and computer science.
Jaron Lanier has argued that we cannot write big programs with a lot of code without creating many bugs, which he concludes is a sign that something is fundamentally wrong.
I agree with Jaron's thesis completely. The correlation of the size of the software with its quality is overwhelming and very suggestive. I think his observations raise numerous questions: Why are big programs so buggy? And not just buggy, but buggy to a point beyond salvation. Is there an inherent complexity factor that makes bugs grow exponentially, in number, severity, and in how difficult they are to diagnose? If so, how do we define complexity and deal with it?
Jaron's emphasis on pattern recognition as a substitute for the rigid, error-prone, binary "match/no match" constructs that are dominant in today's programs is intriguing to me, especially because I've always thought that the principles of fuzzy logic should be exploited far more widely in software engineering. Still, my quest for the answer to Jaron's question seems to yield ideas orthogonal to his own.
I can see two reasonable ways to create complex programs that are less susceptible to bugs. As in medicine, there is prevention and there is recovery. Both the objectives and the means involved in prevention and recovery are so different that they should be considered separately.
The preventive measures attempt to ensure that bugs are not possible in the first place. A lot of progress has been made in the last 20 years along these lines. Such programming practices as strong typing that allows compile-time assignment safety checking, garbage collectors that automatically manage memory, and exception mechanisms that trap and propagate errors in traceable and recoverable matter do make programming safer.
The Java language, of course, personifies the modern general-purpose programming language with first-class systemic safety qualities. It's a huge improvement over its predecessor, C++. Much can also be said about the visual development tools that simplify and automate more mundane and error-prone aspects of programming.
Having said that, these technological advances are still inadequate in dealing with many categories of bugs. You see, a "bug" is often just a sign of recognition that a program is behaving undesirably. Such "undesirability" may indeed be caused by mechanical problems in which code does something different from what it was intended to do. But all too often, the code is doing exactly what the programmer wanted at the time, which (in the end) turned out to be a really bad idea.
The former is a programming bug, and the latter a design bug, or in some exceptionally lethal cases, an architectural bug. The constant security-related problems associated with Microsoft's products are due to its fundamental platform architecture. Java technology, in contrast, enjoys exceptional immunity to viruses because of its sandbox architecture.
I don't believe that future advances in software engineering will prevent developers from making mistakes that lead to design bugs. Over time, any successful software evolves to address new requirements. A piece of code that behaved appropriately in previous versions suddenly turns out to have deficiencies -- or bugs. That's OK! The reality of the program domain has changed, so the program must change too. A bug is simply a manifestation of the newly discovered misalignment. It must be expected to happen, really! From that vantage point, it's not the prevention of bugs but the recovery -- the ability to gracefully exterminate them -- that counts.
In regard to recovery, I can't think of a recent technological breakthrough. Polymorphism and inheritance help developers write new classes without affecting the rest of the program. However, most bug fixes require some degree of refactoring, which is always dangerous and unpredictable.
What about the notion of complexity as the primary reason for software bugs? Do you have any concrete ideas on how to reduce complexity?
Well, I see two principal weapons. One is the intuitiveness of the programming experience from the developer's point of view. Another is the ability to decompose the whole into smaller units and aggregate individual units into a whole. Let me start with the programming experience first.
Things appear simple to us when we can operate intuitively, at the level of consciousness well below fully focused, concentrated, strenuous thinking. Thus, the opposite of complexity -- and the best weapon against it -- is intuitiveness. Software engineering should flow from the intuitiveness of the programming experience. A programmer who works with complex programs comfortably does not see them as complex, thanks to the way our perception and cognition work. A forest is a complex ecosystem, but for the average hiker, the woods do not appear complex.
How well do you think modern programming languages, particularly the Java language, have been able to help developers hide complexity?
Unfortunately, I believe modern computer science and software engineering have failed to make significant advances there. The syntax of all mainstream programming languages is rather esoteric. Mathematicians, who feel comfortable with purely abstract syntax, spend years of intense study mastering certain skills. But unlike mathematicians, programmers are taught to think not in terms of absolute proof but in terms of working metaphors. To understand how a system works, a programmer doesn't build a system of mathematical equations but comes up with real-life metaphor correctness which she or he can "feel" as a human being. Programmers are "average" folks -- they have to be, since programming is a profession of millions of people, many without college degrees. Esoteric software doesn't scale to millions, not in people, and not in lines of code.
Now, back to your question. For a long time, programmers have been manipulating subroutines, functions, data structures, loops, and other totally abstract constructs that neglect -- no, numb -- human intuition. Then object-oriented (OO) programming took off. Developers could, for the first time, create programming constructs that resembled elements of the real world -- in name, characteristics, and relationships to other objects. Even a non-programmer understands, at a basic level, the concept of a "Bank Account" object. The power of intuitively understanding the meaning and relationship between things is the proverbial silver bullet, if there is one, in the war against complexity.
Object-oriented programming allowed developers to create industrial software that is far more complex than what procedural programming allowed. However, we seem to have reached the point where OO is no longer effective. No one can comfortably negotiate a system with thousands of classes. So, unfortunately, object-oriented programming has a fundamental flaw, ironically related to its main strength.
In object-oriented systems, "object" is the one and only basic abstraction. The universe always gets reduced to a set of predefined object classes, some of which are structural supersets of others. The simplicity of this model is both its blessing and its curse. Einstein once noted that an explanation should be as simple as possible, but no simpler. This is a remarkably subtle point that is often overlooked. Explaining the world through a collection of objects is just too simple! The world is richer than what can be expressed with object-oriented syntax.
Consider a few common concepts that people universally use to understand and describe all systems -- concepts that do not fit the object mold. The "before/after" paradigm, as well that of "cause/effect," and the notion of the "state of the system" are amongst the most vivid examples. Indeed, the process of "brewing coffee," or "assembling a vehicle," or "landing a Rover on Mars" cannot be decomposed into simple objects. Yes, they are being treated that way in OO languages, but that's contrived and counterintuitive. The sequence of the routine itself -- what comes before what under what conditions based on what causality -- simply has no meaningful representation in OO, because OO has no concept of sequencing, or state, or cause.
"I envision a programming language that is a notch richer than OO."
Founder and CEO of Grid Dynamics
Processes are extremely common in the real world and in programming. Elaborate mechanisms have been devised over the years to handle transactions, workflow, orchestration, threads, protocols, and other inherently "procedural" concepts. Those mechanisms breed complexity as they try to compensate for the inherent time-invariant deficiency in OO programming. Instead, the problem should be addressed at the root by allowing process-specific constructs, such as "before/after," "cause/effect," and perhaps "system state" to be a core part of the language.
I envision a programming language that is a notch richer than OO. It would be based on a small number of primitive concepts, intuitively obvious to any mature human being, and tied to well-understood metaphors, such as objects, conditions, and processes. I hope to preserve many features of the object-oriented systems that made them so safe and convenient, such as abstract typing, polymorphism, encapsulation, and so on. The work so far has been promising.
Iowa developer Brian Harry, also known as "leouser" on java.net, is renowned for the bug fixes he contributed to Java SE 6, which number well into the hundreds and won him a Duke's Choice award for outstanding platform contributions in 2006. His method was simple: He scanned Sun's openly available bug database for intriguing bugs, primarily in the Swing user interface (UI) code, printed them out, and put the bug reports on a stack beside his computer. Then, he fixed them one by one, submitting them through the standard JDK community contribution process. He works as a Sun Certified Java Developer ( SCJD) and independent consultant.
Bug fixing is a great way to understand how the code works. Reading the code is fine, but working with it is better. Even if you don't find a solution, you'll learn and become a more powerful developer.
"Bug fixing is a great way to understand how the code works. Reading the code is fine, but working with it is better."
Java Consultant and Renowned Bug Fixer
Just think of times when you were faced with an API that made sense, but you had trouble getting it to work. If you cross the Javadoc boundary into the source code, your chances of success increase. For example, I helped someone on java.net find an acceptable solution to the "
JInternalFrame can't play with
Heavyweights" problem. The Sun articles and tutorials typically say that you can't do it. But since I was familiar with the code, I was able to help the developer. Of course, I wasn't 100 percent happy with the solution -- the frames flickered when resizing -- but having something to use is better than having nothing.
Are there any tips you can share?
First, always acquire the test that's attached to the bug report. The state of the bug database is a mixed affair. Sometimes the test is visible, sometimes not. If you're impatient and decide to build a fix based on the report because the test isn't visible, you may waste your time trying to recreate the problem. In several cases, after I reviewed the test, it became apparent that the report gave the wrong description of the problem.
Next, ask yourself if it's really a bug. In some cases, the reporter would have been better off getting help from a Java forum. Moreover, sometimes the test case was buggy. Once you confirm that it's a bug, you dig in. The best way to be effective is by being familiar with the source.
Also, writing down my thoughts as I seek a solution helps clarify the problem and documents the history if I need to revisit it. Including your reasoning processes in your patch helps communicate with whoever evaluates the patch.
And consider writing multiple solutions. Your solution may work, but it may not be optimal. In a similar vein, be prepared to iterate on your solution. It may seem to be a good solution, but unanticipated issues can trip you up.
As to writing unit tests, look at what the patched code interacts with. This may entail viewing a lot of code, but it's necessary to test the different paths to the patch.
What makes a good bug fixer?
Someone who's a researcher and doesn't quit when things initially go wrong. Here's an example. I was flummoxed by this byte-code sequence:
1 goto 8 2 set i to 100 3 set i to 100 --> start protected block 4 do stuff with i 5 return --> end protected block 6 use i to print out error message --> start exception handler 7 return 8 jump to 2 or 3
It seems simple, setting a local variable to the integer 100. But the verifier blew up because
i may have been uninitialized. How's that likely to happen? It's possible that an
out of memory error could be tossed in the exception handler, which led to the problem. Looking at the VM (virtual machine) specification, the data flow analyzer section says that it must "ensure that the specified local variable contains a value of the appropriate type." I thought, "Hmm, that seems to map what's going on here."
I never truly appreciated how much the verifier does for the Java developer until I started working with byte codes. If you're new to it, you quickly learn what a strict taskmaster it is. It may have driven me mad for a while, but I was able to sit down with the VM spec and come up with a rationalization of the problem and a subsequent restructuring that worked.
Likewise, if you're doing a Java code fix, don't just look at the current version of the platform. I've routinely tested out bug tests on 4.0, 5.0, and 6 to investigate when a problem started and stopped. See if you can find why something started. My "uninitialized" problem has roots in the verifier spec. JDK bugs will have roots in some historical moment. Getting a bigger picture may provide clues about the design factors that led to the bug.
Read the full interview with Brian Harry.
The Developer Insight Series, Part One: Write Dumb Code -- Advice From Four Leading Java Developers
Joshua Bloch: More Effective Java and Solving Java Puzzlers Interviews
Masood Mortazavi: Interview Part One and Part Two, and On the Margins
Jaron Lanier: Interview Part One and Part Two, Home Page, and Thoughts on Phenotropic Computing
Victoria Livschitz: The Next Move in Programming and Envisioning a New Language Interviews
Brian Harry: Interview