By Janice J. Heiss
Published March 2012
This series of interviews spotlights Java Champions, individuals who have received special recognition from Java developers across industry, academia, Java User Groups (JUGs), and the larger community.
Jonas Bonér is a husband, dad, programmer, teacher, speaker, author and Java Champion. He is the CTO of Typesafe,which was founded in 2011 by the creators of the Scala programming language and Akka middleware, and is an active contributor to the Open Source community. Most notably, he created the Akka Project, the AspectWerkz Aspect-Oriented Programming (AOP) framework, is a committer to the Terracotta JVM clustering technology, and been part of the Eclipse AspectJ team. Akka middleware was a finalist for the JAX Innovation “Most Innovative Java Technology Award” for 2011.
Oracle Technology Network: You have argued that writing correct concurrent, fault-tolerant, scalable applications is too difficult, because developers frequently use the wrong tools and the wrong level of abstraction. Could you give us examples of situations in which this applies?
Bonér: Let’s start by talking about concurrency. I think the default way to do concurrency in Java, using so-called shared-state concurrency, is not the right default. Shared-state concurrency means concurrent access to shared mutable state guarded by locks. Locks are known by everyone, experts included, to be incredibly hard to understand and get right.
First, I think a system based on mutable state can be very hard to reason about. This is why many developers are used to writing massive test suites—they are so uncertain of their own code. They change something located in one part of the system, and it can break in another seemingly unrelated part of the code base. The main culprit is the use of shared mutable tangled state.
Second, locks are very hard to get right. Locks do not compose, which makes designing applications and libraries hard. Also, the order in which you take locks matters, it’s usually hard to get the granularity right and to know which lock you should lock on, and error recovery is complicated. From a scalability standpoint, locks usually don’t scale very well, because they block the thread of execution and force a context switch. Now if you put these two problems together, you get a system that is hard to write, understand, maintain, and get good performance and scalability out of.
One general problem with locks is that they are too low-level, so the operating system’s way of handling threading and memory fencing leaks all the way up into the programming model. Sometimes we do need to program at that level, but I think we should not have to do so all the time. To solve hard problems such as concurrency, we need to work at a higher level of abstraction. Regarding scalability and fault tolerance, many clustering tools and platforms try to emulate shared mutable state across the network. I think that’s the wrong approach. First, as we have discussed, shared-state concurrency (shared mutable state and locks) creates a lot of problems on its own. If you now try to emulate that on top of a system that is as inherently unreliable as the network, it will simply break down. It creates a leaky abstraction that, in many cases, leaks a bit too much to be really useful.
Oracle Technology Network: Tell us how Akka addresses these problems.
“Akka provides a toolkit for addressing concurrency, scalability, and high-availability concerns.”Jonas Bonér
Bonér: Akka offers multiple solutions to the concurrency problem. It provides a toolkit for addressing concurrency, scalability, and high-availability concerns. It provides one thing to learn and one thing to use. Akka has multiple tools that will help you as a developer. Actors, futures, agents, and software transactional memory all raise the abstraction level and make it easer to write, understand, and maintain concurrent, scalable, fault-tolerant code. Instead of messing around with very low-level constructs, you think in terms of higher-level concepts such as message flows and transactions. What is usually solved by use of low-level plumbing in standard enterprise applications becomes workflow in Akka. So you start to think about how the data flows in the systems rather than how to get the concurrency and scalability exactly right.
Actors in Akka implement a share-nothing architecture. They can be seen as the ultimate object orientation, encapsulating state and behavior. But in contrast to what happens with normal Java classes, every state change is local and cannot in any way affect any other part of the system, which is a big relief. Actors are also much more decoupled than standard Java objects: if one actor crashes (fails with an exception), it will not affect any other part of the system. So if an actor fails, you know where to look, because the failure is completely isolated. When you add the actor style of fault tolerance, in which actors that you create are automatically linked to each other in a “supervisor hierarchy” tree, you get a system that can monitor and heal itself—a very robust system.
Another technique we encourage people to use together with actors is to do functional program (FP) with immutable data. In Akka you also get the benefit that all the low-level, high-performance plumbing is done once and for all, in one runtime, and tested and hardened over many years by many users in multiple different contexts, so everyone doesn’t have to do it by themselves in their own applications over and over again.
Also, when creating Akka, we wanted to embrace distributed computing at its core. The essence of distributed programming is asynchronous message passing or message passing concurrency, which is very different from shared-state concurrency, which can work reliably only in-process. Here’s what we decided to do with Akka: instead of trying to shield the programmer from the network, we made message passing a first-class construct using actors. Doing this makes it easier to understand what’s going on and easier to reason about things. You get a system that doesn’t lie and isn’t trying to pretend to be something it’s not. You might have heard people talk about actors as transparent distributed computing, but it’s actually the other way around: they implement very explicit distributed computing, and then—if it’s used in the local context—it’s only an optimization.
To sum things up: with the actor model, you have the ability to scale up—meaning scaling for multicore architectures on a single machine and scaling out across multiple different machines on one network—in a unified way, with a single programming model, a single abstraction, and a single runtime. Additionally, with the actor model, you have resilience resistance to failure built in, so without leaving the actor model, you get high availability (HA), which is very important for writing scalable software today.
Oracle Technology Network: You’ve written, “For fault-tolerance Akka adopts the ‘Let it crash,’ also called ‘embrace failure,’ model, which has been used with great success in the telecom industry to build applications that self-heal, systems that never stop.” Could you elaborate on this?
Bonér: Sure, we’ve already touched on the subject briefly, but let me try to explain it in more detail. When you create a standard Java enterprise application, the state that is critically important to the application is usually scattered across the application. You have a little of it here and a little of it there, and this state is usually guarded by try/catch blocks. This leads to scattered and tangled state and error recovery and encourages a defensive way of programming. In Akka you architect your application in supervisor hierarchies. When you create an actor C (for child) from within an actor P (for parent), P becomes the parent and supervisor of C. This means that if actor C fails with an exception, actor P will be notified and can take action, such as restarting actor C. What the actual action should be is declaratively configured.
What this gives you is a tree of supervised actors, a supervisor hierarchy. This yields a nondefensive way of programming in which when an actor fails, so instead of trying to trap the error and recover, the actor simply dies and lets someone else (the supervisor) deal with the error—a fail fast approach. If a supervisor can’t recover from an error, it will escalate it by sending it up the hierarchy, and hopefully an actor higher up in the hierarchy is configured to deal with the error, or else the whole application will fail.
For example, in the case of an OutOfMemoryError, there is not much to be done and the application will have to exit. However, actor supervision works across actor applications and across physical machines, so letting the supervisor hierarchy span multiple machines so that the application can recover from complete node failure is encouraged. The result is a system that can repair itself, can heal itself at runtime, and forms a basis for developing very fault-tolerant systems.
The way you normally design an actor-based system is that you have a set of top-level actors where you put your most critical state, called the Error Kernel. Then you add levels of defense that protect the Error Kernel, much like an onion where you have layer on layer. If, for example, an actor in the Error Kernel needs to perform an operation that might fail, it will not do it itself but will instead delegate the work to an actor in its defense layer (who can also delegate to his defense layer, etc.). If the operation fails and the worker actor dies, the Error Kernel is still doing fine and the application can continue functioning as if nothing had happened.
Oracle Technology Network: Oracle developer Brian Goetz, coauthor of Java Concurrency in Practice, said ”Often, the way to write fast code in Java applications is to write dumb code—code that is straightforward and clean and follows the most obvious object-oriented principles… Because compilers are written by humans who have schedules and time budgets, the compiler developers focus their efforts on the most common code patterns, because that’s where they get the most leverage. So if you write code using straightforward object-oriented principles, you’ll get better compiler optimization than if you write gnarly, hacked-up, bit-banging code that looks really clever but that the compiler can’t optimize effectively. So clean, dumb code often runs faster than really clever code.”
Bonér: That is definitely true—at least in a general sense. Having worked on JRockit I know that clever as well as bad code can make it really hard for the optimizing compiler to do its work. But in my experience, writing extremely high-performant Java code sometimes requires abandoning the standard and simple path. For example, in Akka we had to resort to using “sun.misc.Unsafe,” because using Atomic*FieldUpdater yielded too much overhead for the computer algebra system and because AtomicReference and friends introduced too much indirection and false sharing. Another example is the use of ByteBuffer for storing large volumes of data off-heap to avoid garbage collection. But these are special problems we solved under the hood: Akka users need know nothing about these internals, and this presumably is not what Brian had in mind. The bulk of actual actor implementations will use very straightforward code—that is our goal.
Oracle Technology Network: Java Champion Heinz Kabutz said “In my experience, good object-oriented design tends to produce faster and more maintainable Java code. But what is good code? I find it easier to classify ‘goodness’ by using good object-oriented design patterns. I usually encourage software development companies to train all of their programmers, from the most junior to the wise architect, in design patterns. Teams that employ good design patterns find it much easier to tune their code, which will be less brittle and require less copying and pasting.”
Do you agree?
|"I think that competition is good, and over the years, the HotSpot and Oracle JRockit teams have motivated each other to push the boundaries of what anyone thought was possible with Java.” |
Bonér: I’m not sure. First, I think that “goodness” is not equal to object-oriented code. In my experience functional (FP) code is sometimes a better alternative and can yield both simpler and more maintainable code that composes better. Second, I don’t agree that “goodness” is equal to using design patterns, at least not in the classic Gang of Four [GoF] sense. Using design patterns is in many cases a way to work around shortcomings in the programming language or frameworks. In general, I think design patterns are overrated.
Oracle Technology Network: With the purchase of Sun Microsystems, Oracle has both Oracle JRockit and HotSpot, which have differing styles, assumptions, and strengths. HotSpot works well for both client and server applications, whereas Oracle JRockit is highly tuned for enterprise application stacks. Oracle is working to converge the two, using the HotSpot code base, because Oracle has more engineers who understand the HotSpot code than engineers who understand JRockit code. Performance and serviceability features from Oracle JRockit will be ported over to HotSpot over time. You worked with JRockit at BEA. Do you have any reactions to this move?
Bonér: I think it’s mostly for the good. Oracle JRockit has some really neat features, such as its Mission Control tool set, which I think is outstanding for debugging Java applications. Making that available to a wider audience will be great. Both teams are amazing, and bringing them together will definitely create a better product. On the potential downside, I think competition is good, and over the years, the HotSpot and Oracle JRockit teams have motivated each other to push the boundaries of what anyone thought was possible with Java.
Oracle Technology Network: Your Website, you say that if you can’t solve a problem, it’s because you’re playing by the rules. Can you give us some examples of how developers get stuck because they are playing by the rules?
Bonér: We are too often constrained by habits, current knowledge, “best practices,” and also hype. We need to constantly train ourselves to think outside the box. The only way to break out of these habits is to constantly challenge ourselves to be curious, learn new things, and question known “facts.” The brain is lazy, and you have to constantly train it by forcing it out of its comfort zone. Rich Hickey recently gave a great talk about “simple versus easy” in which he defined easy as familiar. The brain loves “easy,” because it by nature never wants to learn anything new. But you can never expect to become better at anything in life that way. A great workbook on how to train your brain is Andy Hunt’s excellent Pragmatic Thinking and Learning—it will help you along the way.
Oracle Technology Network: Tell us about the open source Java projects you are involved in.
Bonér: The open source projects I am spending most of my efforts on right now are Akka and the Typesafe suite of open source software. Around 10 years ago, I started my first open source project: the AspectWerkz AOP framework, which was a great success and later merged with AspectJ. Another fun project I was part of was the Terracotta JVM clustering technology.
Oracle Technology Network: You gave a talk at Scala Days at Stanford. How do you view the future of Scala?
Bonér: Scala has a bright future. We see it being adopted by many large organizations in a wide range of industries, from investment and merchant banking to retail and social media, simulation, gaming and betting, automobile and traffic systems, healthcare, data analytics, and much more. It has grown from an academic niche language to a viable, and in many cases better, alternative to Java and C#.
Scala runs on the JVM, which is by far the best-managed runtime there is. It is a statically typed language, which for me means mainly two things: I get excellent performance, comparable to that of Java, and I get solid help from the compiler, which helps catch bugs and design errors early in the development process. This makes it easier to do refactoring and have the code evolve over time without bit rot.
Even though it is a statically typed language, Scala is as expressive and concise as dynamically typed languages such as Ruby, Python, or Groovy. It has type inference, which, in short, means that you don’t have to hold the compiler’s hand all the time, declaring types all over the place as we are used to in Java. It can draw its own conclusions from context, which enables you to focus on the essence.
Scala is the perfect systems language for building high-throughput, low-latency systems, which is the reason for using it as the basis for the Akka middleware. Scala is an excellent language for building concurrent and parallel systems with features such as parallel collections for automatic parallelization of algorithms, actors for message-passing concurrency, a library for doing nonblocking future composition, and an upcoming STM library. Now with Typesafe, we can provide first-class support in Scala and a full development stack around it. The Typesafe stack features Scala, the Akka middleware, and the Play Web framework along with a management/monitoring console and development tools.
Oracle Technology Network: Where in the process of programming do you have the most fun?
Bonér: I enjoy all aspects of programming. I think it is the full journey that is the most fun, the creative process of taking an idea to become something real and making it fit into a larger context.
Oracle Technology Network: What has surprised you the most about the evolution of the Java platform?
|"It is totally mind-blowing how good the JVM is; the more you learn about it, the more amazed you will be. There is nothing even close today. This is the reason why Java, the platform, is not going away anytime soon.” |
Bonér: The JVM. It is totally mind-blowing how good the JVM is; the more you learn about it, the more amazed you will be. There is nothing even close today. This is the reason why Java, the platform, is not going away anytime soon.
Oracle Technology Network: The Java class you couldn’t live without is...?
Bonér: “AtomicReference” – it gives us the possibility of writing high-performant lockless concurrent code.
Oracle Technology Network: What would you like to see in Java EE 7?
Bonér: Support for tail-call optimization and continuations. I’d also love the possibility of doing high-performance CAS through the java.util.concurrent APIs without having to resort to sun.misc.Unsafe.
Oracle Technology Network: What are the biggest misconceptions about programming that you encounter in your work as a technical evangelist?
Bonér: That making software is a manufacturing process. That programming is engineering. It’s not. It is craftsmanship. It is an art form. We don’t need engineers, we need craftsmen and artists.