Conscientious Software: Part One of a Conversation with Sun Microsystems Laboratories' Ron Goldman

By Janice J. Heiss, March 2006  

Articles Index

The world of computing is in a state of rapid transition, but what we are (or should be) transitioning to is very much open to debate. Sun Microsystems' Ron Goldman believes we are heading towards a world of interoperable computing in which stand-alone applications are diminishing in importance. He has been working with colleague Richard Gabriel to develop ways to make large systems more robust, stable, and better able to take care of themselves -- what they call "conscientious software."

Prior to coming to Sun in 1998, Goldman developed a program to generate and manipulate visual representations of complex data for use by social scientists as part of a collaboration between NYNEX Science & Technology and the Institute for Research on Learning. Goldman has a Ph.D. in computer science from Stanford University, where he was a member of the robotics group. He currently works as a senior staff engineer at Sun Microsystems Laboratories where he researches alternative software development methodologies and new software architectures, with a particular focus on what makes software more robust. Other research interests include biologically inspired computing, open source, programming language design, and user interfaces.

In Part Two of this interview, we will explore his insights into open source, which he and Gabriel have articulated in their book, Innovation Happens Elsewhere: Open Source as Business Strategy.

question Give us your assessment of the state of software today.

answer I think that we are in the middle of a very interesting transition from stand-alone applications to applications that rely on networked services. Whether it's a mash up of Google Maps with your address book, or a web page displaying information pulled from several databases, instead of one software application we now have multiple services, created by different organizations, that need to interact. In this world of interdependent applications and services, nobody's in charge anymore -- no one person can say it's time to throw the switch to do a global recompile. A new version of software often unintentionally breaks other programs that rely on it. There's a paradox: to be useful, software must depend on other software; but to be reliable, it must be independent -- able to maintain its integrity in a changing environment.

"In this world of interdependent applications and services, nobody's in charge anymore -- no one person can say it's time to throw the switch to do a global recompile."
Ron Goldman
Senior Staff Engineer at Sun Microsystems Laboratories

Another important trend is that software continues to become more and more complex. We now have greater expectations about how our software should behave, and meeting those expectations requires more code. The problem though is that our programming languages, tools, and methods are more suited to creating small software artifacts. For large scale software they just don't help us. As a result we end up with brittle software that is not as robust as it needs to be. We also end up with a large number of software projects that fail.

question So how do we make better software?

answer Over the last few years, my colleague, Dick Gabriel, and I have asked, "What will the software of the future look like? What do we need to change to get there?" We spent a number of years looking at the social and organizational side of how software is created, focusing on open source. The result of that study is contained in our recent book, Innovation Happens Elsewhere: Open Source as Business Strategy. For the last year, as members of Sun Labs, we have shifted our focus to the technological issues, trying to identify new principles that will let us create robust, complex software.

Software today is produced according to a manufacturing model: A finished product is constructed at the factory and shipped to its destination where it's expected to act like any other machine --reliable, but oblivious to its surroundings and its own welfare. Once deployed, it is on its own, along with the people who use it. This results in brittle, buggy software, in which the end-user is left to hope that the next release will fix their problems.

"It's time that our software start using some CPU cycles to actively monitor its own activity and environment, to continually perform self-testing, to catch errors and automatically recover from them, to automatically configure itself during installation, and to participate in its own development."
Ron Goldman
Senior Staff Engineer at Sun Microsystems Laboratories

Until recently that was the best we could do -- just getting the code to implement the functional requirements was a challenge and the resulting software often used up all the available cycles on the hardware. Now there will always be some applications that need the full processing power of the CPU (and are hungry for even more), but for most programs current hardware speed is quite sufficient. We think it's time that our software start using some of those CPU cycles to actively monitor its own activity and environment, to continually perform self-testing, to catch errors and automatically recover from them, to automatically configure itself during installation, to participate in its own development and customization, to pay attention to how humans use it and become easier to use over time, and to protect itself from damage when patches and updates are installed. Such dynamic software systems will be self-contained, including within themselves their entire source code, code for testing, and anything else needed for their evolution. We have coined the term "conscientious software" to describe code that takes responsibility for itself and its future.

question How far away are we from the software you describe?

answer There's that [novelist] William Gibson quote, "The future is already here. It's just not very evenly distributed." Recent applications are already beginning to exhibit many of these properties. A lot of software is no longer static and can now be customized. Plug-in architectures and web-service-based applications can be changed after being installed by adding modules that provide services based on defined protocols and interfaces. Adobe Photoshop is a classic example. And browsers, integrated development environments, and operating systems are constructed this way. A lot of our ideas are inspired by seeing what current software is starting to do.

That's the good news. The bad news is that we think it requires a different way of thinking to create conscientious software -- a lot of the basic underlying assumptions of our current programming models need to change. It won't be easy to convince people that they need to accept that software will always have errors or that performance is of secondary importance.

Admitting Inevitable Failure: Bugs and the Misguided Focus on Performance

question You argue that developers should accept the fact that failure and bugs are inevitable and should incorporate processes into their software that seek out and repair errors.

answer A lot of studies show that even well-tested code written by experts will have a significant number of bugs. It's foolish to write a program or system assuming that nothing will go wrong. Errors are unavoidable due to both bugs in the implementation and unexpected inputs from the environment. It is impossible to thoroughly test for unplanned interactions in which changes in one piece of code affect quite distant, seemingly unrelated pieces of code. Errors may manifest only when a particular combination of logic paths is executed.

Rather than putting all of the programming effort into trying to prevent errors from occurring, we feel it is better to devote runtime resources to detecting that an error has occurred and recovering from it. Let me give a historical example: garbage collection, or automatic memory management.

It's helpful to look at where garbage collection came from. When John McCarthy was designing the LISP language, one of the programs he wrote was an elegant algorithm to do symbolic differentiation. He recognized that the code would be using up memory and if it wasn't released that memory would eventually run out. And he deliberately decided that he didn't want to mess up his elegant algorithm for differentiation with a lot of record keeping and bookkeeping for memory, which had nothing to do with the problem he really cared about. So he did something that we're considering doing in a number of other places. He accepted the idea that all programs have bugs and created a system that can repair and clean up unused memory and, in a sense, that can recycle it and make it available.

Making memory management a separate process independent of the application gives you a much cleaner, simpler-to-write program, and it is also easier to focus on good memory management. Repairing and acknowledging the problem makes the system more robust. If we're serious about wanting robustness, reliability, and security in our programs, we need to devote resources to them, resources above and beyond what the application uses.

"We need to expand our definition of performance to go beyond just the time it takes to do a task."
Ron Goldman
Senior Staff Engineer at Sun Microsystems Laboratories

Although automatic memory management has been around for more than 40 years, a lot of people still don't want it in their systems because it seems inelegant. It just strikes people as wrong -- "It's my memory. I should be releasing it when I know it's no longer being used. "Theoretically that may be true, but in practice programmers continue to forget to free up memory when they are done with it or, even worse, try to free it up while it's still in use. The results? Buggy code that is apt to crash unexpectedly.

Why force programmers to do things for which they are not suited? Why not instead have the computer take over as many of those tedious chores as we can? Our software should be an active participant in maintaining its integrity. Our programming languages can deal with low-level concerns like array bounds checking and preventing buffer-overflow attacks. For higher-level matters such as maintaining system-wide constraints, such as providing a specified quality of service, we need new mechanisms.

question You perceive a misguided focus on performance as currently standing in the way of self-repairing software. Tell us about this.

answer As I said earlier, until recently our software has been limited by the available hardware processing power. As a result it has skewed our thinking, making us focus on performance and efficiency. For example, even though in current systems the time spent on garbage collection (GC) is, mostly, not that significant, some people still don't want to give up what they perceive as a performance hit. Even for real-time systems we now have various garbage collection algorithms that allow GC to happen in the background.

We need to expand our definition of performance to go beyond just the time it takes to do a task. We need to include usability, robustness, security, and all the other factors that define the behavior of our programs.

The Inspiration of Biology
"One idea that seems promising is to separate the software that does the work from the software that keeps the system alive."
Ron Goldman
Senior Staff Engineer at Sun Microsystems Laboratories

question In creating complex software you argue that biological models are important.

answer Indeed. Biological systems are incredibly complex and yet they manage to be very robust. By identifying the principles biological systems use and applying them to software we will be able to create much more robust computer systems.

One idea that seems promising is to separate the software that does the work from the software that keeps the system alive. In computing, perhaps 5% of our code deals with exception handling and error correction, which seems like a lot, while 95% tries to get the basic job done. Biology appears to reverse this, with 5% doing the basic metabolism and 95% functioning to make sure that the 5% can do its job. Think about keeping your heart beating -- is that overhead? Or is that a core activity that's part and parcel of who you are? Think of your body doing the work of keeping your mind and brain functioning. That's not overhead.

Likewise, maintaining your computer system's health to make sure that all of its components are functioning is not overhead. That's just what is required to have a robust system.

Allopoietic and Autopoietic Computing

question Tell us about allopoietic and autopoietic computing.

answer We came across two words from biology, allopoietic and autopoietic, that are quite useful. "Allo" means "other" and "poietic" means "to make". So a factory is an allopoietic system that makes something outside itself. "Auto" means "self" so a living cell or organism is an autopoietic system that makes itself. The cell is constantly making new proteins, replacing the cell membrane, copying the DNA, reproducing an entire new cell, maintaining the integrity of the cell, and so on.

Our current software applications, say banking systems, are allopoietic systems. They take various inputs and perform some desired functionality. Our programming languages, like C or the Java programming language, have been designed to create allopoietic programs. If we try to make the system more robust by adding exception handlers and error detection code, the program becomes hard to understand and very difficult to maintain. It goes beyond what people can really write.

"We need new languages that are not intended to replace the current languages, but to augment them, to do something different."
Ron Goldman
Senior Staff Engineer at Sun Microsystems Laboratories

A second approach that people have tried is to start with a self-organizing, autopoietic system that generates and controls its own development. But there's a problem: How do you write a banking application in such a system? In both approaches you're locked into a certain type of expression. We want to take the allopoietic components, and embed them in a larger autopoietic space with components that are working to maintain the integrity of the system.

For instance, a web server may devote a lot of resources to serving up web pages, but resources are also devoted to such things as logging and indexing the web site, and doing internal editing. If you have been slashdotted [/..] and suddenly get heavy traffic, you may want to dynamically shift, and devote more resources to serving up web pages, and put off indexing until later. So how is that done? Is it something that's built deeply into every component, or is there a controlling layer outside the sensing layer that enables the web server to say, "Boy, I'm working a lot." And then the controller picks up on that, and says, "Oh, there's a hot spot right now, let's inform these other jobs that don't need to be done right now, let's throttle back, and devote more resources to the hot spot. Let's clone some extra copies of these web page servers." The system adjusts itself.

And it may be that the language we use to describe that process is not the same as our normal programming languages -- for example, we may not be able to write a sort algorithm in it. We want something that has feedback loops in it, since biological systems seem to be robust in part because of their extensive use of nested feedback loops. Of course we can program feedback loops using low-level features of our current programming languages, but they're not a high-level construct that we have right from the get go. We need a language that lets us easily construct complex feedback networks.

question So you are calling for some new autopoietic languages to augment the allopoietic systems we have.

answer Yes, we need new languages that are not intended to replace the current languages, but to augment them, to do something different. Maybe this language would not be Turing complete -- so you're not meant to do everything in it. And in fact, it's going to be implemented, in some sense, through lower level modules that are using Java software, or C++, or whatever, to implement and work with this. So we have this picture of augmented modules that monitor and test and repair the system. It has traditional APIs, but it also has more autopoietic interfaces that would be expressed in a new language.

Software that Sees

question A key part of your vision includes software with system components that can "see" into their environment and into other components and make continual adjustments.

answer When we write code we are well advised to follow the principles of encapsulation and information hiding. Otherwise our modules will become very tightly coupled to each other and hard to change. However, when we run a program it can be advantageous to be able to see into it. An obvious example is testing, where the test code may need to check the internal state of a module. We believe that it is important to have visibility into the system in order to assess its health and to make decisions about adjusting it. Visibility consists of continually updated descriptions, for example, of what's inside a system's software components, how a system is currently configured, the overall state of the system, what it's working on, which users use what software in which ways, and so on.

"We believe that it is important to have visibility into the system in order to assess its health and to make decisions about adjusting it."
Ron Goldman
Senior Staff Engineer at Sun Microsystems Laboratories

Here's a simple example that shows how we see visibility improving robustness: Each component in a system can describe its internal state by constantly muttering about what it is doing as it goes about its business -- muttering is like logging, only not just about reporting errors, and is probably not persistent. Another thread can periodically check that each component's mutterings indicate it is healthy and making progress; if not, the thread can reset or restart the failing component.

Our inspiration about the importance of visibility comes in large part from the use of stigmergy as a powerful self-organizing mechanism in biological systems. Stigmergy is where individual parts of a system communicate with each other indirectly by modifying their local environment. For example, foraging ants deposit a trail of pheromones to indicate that they have located a source of food. Other ants sense the trail and follow it to the food source. In computing, an example of this would be JavaSpaces technology where a process writes data into a space that other processes can then read, do computation based on what they've read, and then possibly write new data into the space.

This idea of visibility is also related to what Jaron Lanier is aiming at in what he calls phenotropic computing. Instead of a rigid, minimalistic API between two modules, he wants surfaces that can be pattern-recognized or sampled. We are considering shared blackboards with simple textual pattern-matching, extensions of something like Common Lisp's keyword/optional argument lists and calling conventions, or even passing XML documents. The key idea being that, instead of one agent reaching inside another and commanding it to do some function (e.g., a remote procedure call), it instead deposits a request that the second agent can then interpret and deal with as best it can.

Software Like Children

question Any closing thoughts?

answer We're still in the infancy of computing. In a hundred years we will have many new and wondrous ways of specifying complex tasks for our computer systems to perform, and we can be sure they will not look like our current programming languages. (Though we may still be running legacy COBOL programs....)

Our software is like a child. At first, parents must provide everything and be ready to intervene to prevent the next disaster. But, after a while, the child grows up and is able to take care of itself, to cope, and hopefully to contribute something new. The child becomes responsible. With luck, persistence, or good upbringing, the child may become conscientious.

We hope the same for our software.

See Also
Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.