The Developer Insight Series, Part 5: Java Champions on Cloud Computing

By Janice J. Heiss, June 2010

Over the years, I've heard developers talk about their favorite code, funniest code, most beautiful code, how to write code, how not to write code, the obstacles to writing good code, what they love and hate about writing code, and so on. In the process, I've encountered many insights worth sharing.

Parts One and Two of this series provided advice on how to write good code. In Part Three, developers reflected on the actual process of writing code, how it happens, what it feels like, and how they do it. In Part Four developers shared their favorite and funniest code and code stories.

Here, in Part Five, three Java Champions, individuals who have received special recognition from Java developers across industry, academia, Java User Groups (JUGs), and the larger community, discuss cloud computing and Java technology.

Contents:


Photo of Adam Bien
Adam Bien
 

Adam Bien: I See Two Unrelated Concepts Called Cloud Computing

Java Champion Adam Bien is a self-employed consultant, lecturer, software architect, developer, and author in the enterprise Java sector in Germany who implements Java technology on a large scale. He is also the author of several books and articles on Java and Java EE technology, as well as distributed Java programming. His latest book, Real World Java EE Patterns -- Rethinking Best Practices, explores the challenges of cloud computing.

He was a member of the JCP Expert Group for Java EE 6.

I see two unrelated concepts called cloud computing. The first one is related to grid computing, where parallelizable tasks are distributed to independent computing nodes and then aggregated to a consistent result. Frameworks like Hadoop, map-reduce algorithms, are an example of this approach.

The other paradigm is a virtual, private or public, data-computing center with an accessible API. You can model the environment in drag-and-drop-like fashion or, even more importantly, control it directly through a programmable API. You can, of course, run the grids in public or private clouds.

These paradigms also differ in their usage models. Grid computing is intended to be used by a few power users who need a considerable amount of computing power. On the other hand, in cloud computing, significantly more users access the machines with relatively low resource utilization.

While these two models are conceptually opposite, the underlying technology could be very similar. The first is often called platform as a service, and the latter, infrastructure as a service.

What are the advantages and disadvantages of Java EE for the cloud?

Java EE was designed to be deployed to a distributed environment. Cluster management and extensive monitoring and management capabilities are supported by major application servers.

The EJB 3 programming model encourages stateless, idempotent, and atomic or transactional design. It is interesting to note that the programming restrictions are very similar to the Google App Engine. In both cases, thread management, file access, changes to the underlying JVM -- especially class loading, system properties, and security managers -- are not allowed.

Furthermore, both Java EE 5 and Java EE 6 come with standardized packaging -- the Enterprise Archive (EAR), which makes the provisioning of cloud apps relatively easy. And EAR solves some cloud-interoperability issues: It's a lot easier to move an app from one cloud to another. Java EE 5 and 6 are portable, so applications can be easily moved from one application server to another, regardless of whether they run in a cloud or not. They both will run on JDK 5 or higher.

The JVM itself comes with robust remote debugging, profiling, and monitoring capabilities. This currently simplifies the development of distributed apps and should also simplify cloud- enabled apps.

Java EE, however, was not designed to be operated in highly dynamic environments. Clouds tend to be elastic, and most of the current computing centers are not. There is some work to be done with respect to dynamic discovery, self-healing, and dynamic load balancing. This isn't difficult to accomplish with the current application server: we did it in projects with GlassFish v2 and several years ago with JBoss.

Ironically, Jini and JXTA were always capable of behaving this way. You can simply borrow the concepts like leasing or dynamic discovery from these frameworks and apply them to a rather static Java EE environment.

Java developer Geertjan Wielenga asks in a DZone discussion: "What will applications/software look like on the cloud? Are they all web applications? Web services? Or what?" How would you answer him?

Actually, some cloud providers are not that different from on- premise software. The application simply runs elsewhere.

The difference begins with the implementation of persistence services. Some cloud providers, such as Google or Amazon don't offer support for relational databases. The only options are to live with the map-like structure or try to install a relational database in the cloud. This necessitates some rethinking about how the persistence should be designed and can even impact the design of the user interface.

The obvious use case for clouds are load generators. It is very hard to simulate thousands of users on premise. Clouds are perfect for that: You can leverage the power of the cloud for your load tests and pay only for the spent CPU cycles. Without a cloud, many underutilized machines must be maintained to cover a few peaks a year.

Batch processing is another use case -- perfectly suitable for grids and clouds. The workload can be distributed across the working nodes or images -- it can scale very well.

The map-reduce algorithm works similarly: The workload is sprinkled across the cluster or cloud, and the results are aggregated afterward.

Typical web applications are also perfectly suitable for the cloud. The request-response communication style fits very well into the cloud model. If your app is successful, the load can be handled easily, and if not, you have to pay for the idle cycles. Twitter runs on an Amazon S3 already.

Google Apps, MobileMe, Salesforce, and many others are excellent cloud-computing examples. They would, however, also run perfectly behind a corporate firewall. Cloud computing makes it easier to implement new ideas without having to invest in a data center up front. I expect new applications to appear in the near future precisely because of this fact.

What role will rich Internet applications (RIAs) play in cloud computing in general?

We should agree on a definition of RIA in the cloud context first. In my opinion, RIAs are interactive applications with rich user experience. The richer the experience, the less likely are we to find pure web applications -- we are more likely to see native clients.

Native clients run either in a browser plug-in or have to be installed on the client machine. They can operate a certain amount of time offline and do not require fine-grained server communication. In extreme cases, only the processing result could be stored into the cloud. A perfect example for this interaction model would be an email client or even an IDE.

Such applications can run easily on the premises or in the cloud. The more that processing can be done on the server, the more interesting cloud computing becomes. A video-processing or - authoring application could send the rough content to the cloud for processing and receive the results later. The processing could be parallelized and performed faster. The cloud would start new instances on demand and kill idle ones after the peak demand gets processed.

JavaFX comes with a built-in pull XML and JSON parser. It is really easy to access HTTP-based (REST, XML, JSON) services from the cloud with built-in JavaFX capabilities. If this isn't sufficient, existing Java frameworks and libraries are easily accessible.

The "I escaped the browser" trend is already visible in Twitter. The majority of Twitter users access the stream with native clients like TweetDeck, TwitterFx, TwitterFon, Tweetie, and so on over the API and not the web interface.

Some argue that Java Management Extensions (JMX) is not up to the high requirements for management and monitoring in cloud computing.

With JMX, you can build a perfect cloud-monitoring system. The out-of-the-box capabilities, however, are not convenient enough.

JMX is basically JavaBeans exposed to an agent and so remotely accessible. The JavaBean properties become visible to the monitoring tools, allowing one to invoke the management methods remotely. In addition, JDK and Java EE come with some predefined, visible JMX beans.

In either case, the data is too fine-grained for the cloud. A virtual data center requires an aggregated view, not a fine-grained JVM or application-server statistics. The statistical data has to be organized in a hierarchical, tree-like structure. Such a structure allows an overview first and then a drill down to the details for troubleshooting or fine tuning when needed.

Researchers at the University of California, Berkeley, state: "Provided certain obstacles are overcome, we believe Cloud Computing has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Developers with innovative ideas for new interactive Internet services no longer require the large capital outlays in hardware to deploy their service or the human expense to operate it." Your response?

I absolutely agree. Clouds are perfect for startups. It's imprudent to spend a huge amount of money on infrastructure, only to discover that your idea won't work.

In the long term, hybrid data centers are likely. The constant load could be handled on premise, and the peaks processed into the cloud. This approach is even more challenging but may pay off.

This "cloud-bursting" approach requires you to dynamically reroute requests from your data center into the cloud in a "hot" way. You will have to keep your data consistent and synchronized across the data center and the cloud.

What effect will cloud computing have on open-source development?

First, the hard work has to be done. We are almost there. Eucalyptus is an XEN-based, open-source solution that can be deployed on your hardware. It is basically an EC2 emulator.

With Puppet, you can manage thousands of nodes in a continuous- integration manner. Hyperic and openNMS help you to discover, monitor, and manage your network infrastructure. The Open Cloud REST-based API is really interesting as well: You will be able to manage the cloud through an open-source standardized API.

In addition to these infrastructure services, we can expect a lot of new and interesting open-source applications. Why? Partially because it will be fun to develop cloud apps with new tools, approaches, best practices, and, of course, challenges with the open-source infrastructure available.

It would be interesting to build a JavaFX app that manages the cloud via the Cloud RESTful API -- a kind of metacloud application.

Read the full interview with Adam Bien.


Photo of Alan Williamson
Alan Williamson
 

Alan Williamson: People Are Being Sold the Idea That The Cloud Will Solve All Their Problems

Alan Williamson, named the UK's first Java Champion in 2006, has spent more than 15 unusually productive years as a developer. He graduated with full honors in computer science from the University of Paisley in Scotland in 1994. He was, for several years, editor-in-chief of Java Developer's Journal, a major resource for the Java community. In 1998, he created the BlueDragon Java CFML runtime engine that, among other things, powers MySpace.com. He works as a consultant to many startups and more recently co-founded aw2.0 Ltd, a software company specializing in deploying software solutions within cloud networks.

On November 20, 2008, he conducted the first-ever immersive cloud computing daylong boot camp, and in January of 2009, he was named editor of the new Cloud Computing Journal.

Tell us about your first-ever immersive cloud computing boot camp.

What a day! We prepared a whole day's worth of material, complete with hands-on demos and samples, but due to the sheer volume of people and the number of questions asked, we didn't get to half of our content.

People hear all the marketing hype, but they're desperate to know the finer details. Just what does the cloud mean to you, the developer? What does it mean when I spin up an image? How does it look? What can't I do? Those are the kinds of things we tried to answer.

One of the things that surprised many was just how "familiar" it was when it was demonstrated for real. We've got another coming up in May '09 in Prague, with updated content, but by and large, it's another day of demystifying the cloud.

The word "cloud" has done to the industry what the term Web 2.0 did for any web page with a button! For a while, everything was getting the "Web 2.0" moniker. Now it appears anything that is accessible from a browser is getting rebranded as a cloud solution.

In point of fact, cloud computing as a complete end-to-end stack hasn't really offered anything new -- it's merely a collective term for outsourcing all types of resources. We can now look upon all our computing needs, including CPU time, as buy-on-demand resources that historically have not been so readily available.

The founder of the Free Software Foundation and the creator of GNU, Richard Stallman, has attacked cloud computing. He says it's mostly marketing hype, and computer users should keep their information in their own hands. Stallman is concerned about privacy, ownership issues, and loss of control, and he thinks the cloud will lock people into proprietary non-open-source software. He says, "You're putty in the hands of whoever developed that software." Your response?

I'm never one to miss a marketing opportunity, but I think he has it wrong. The problems of privacy and data have been with us since day one. I recall a Scott McNealy JavaOne keynote many moons ago, where he talked of data and offsite storage in much the same way we put our money into banks. His point was that we are happy to trust the banks with our money, but we have yet to trust the "network," as he called it back then, with our data.

The problem is slightly different, because when we deposit a PH*PH*PH\uFFFDPH*PH*PH10 note, we don't expect the exact same PH*PH*PH\uFFFDPH*PH*PH10 note to be returned to us. But in data terms, we actually do want the same collection of bits to return to us. So a data center has a much higher duty of care to look after our data, and we should be holding them to a much higher accountability than we do our banks.

As to the point about non-open-source software, Stallman is ignoring the fact that the vast majority of cloud services are built upon open source. You can't go too far without bumping into Linux or OpenSolaris at some point.

What I find fascinating is that no one is forcing anyone to use these services: It's the end consumer's choice as to how much of their personal data they are willing to give up in return for a free service. I think this will change too, and we'll return to a non-advertising- funded model: An honest day's service for an honest day's price -- that's what we'll see.

Industry analyst Gordon Haff argues that moving software from the desktop and data center onto the network cloud means that source-code availability will become less important and that open-source software will lose out. Do you agree?

I disagree completely. I believe the complete opposite will be the case, as people will demand checks and assurances that they can indeed trust their infrastructure to the cloud provider.

The cloud provider has to show more than simply "uptime" statistics -- complete transparency will become the norm. Observe how the likes of Google are open-sourcing more and more of their technology in order to gain not only the trust but the help of the community.

What do you see as the biggest potential danger of cloud computing?

Over-expectations and under-delivery. People are being sold the idea that the cloud will solve all their problems. It won't -- it merely moves the problem to another domain. Instead of worrying about bare, physical metal, you now have to worry about provisioning and the process of managing a more rapidly changing data center.

What new challenges does cloud computing present for developers?

Great question. Developers are hackers. We love to find shortcuts to the system and make things run faster in a way that others haven't yet thought of. As they say, necessity is the mother of invention. Tell someone they can't do something, and they will find a way. But the world of clouds enforces a level of strict adherence to ensure success.

If a developer builds applications in a truly distributed service- oriented manner, then their move to running a pool of on-demand servers will not be as problematic as someone who hard-codes configurations within applications, or assumes a given service will always be available.

In our experience, it's the little things that can really throw your project a curveball. For example, can your application cope with MySQL/JMS/caching servers suddenly dropping out and reappearing on a completely new IP address? These are the things you need to build.

You have said that in working with cloud infrastructures, you need to "constantly monitor the cloud and your application in it." How do you do that?

That's easy. Apart from the database, all our software is exclusively Java, and what better way to keep an eye on your system than JMX -- one of the most underused inbuilt APIs of the JVM.

There is a whole raft of JMX clients, including the standard JConsole, that you can utilize to help monitor everything. The trick is to make the system dynamic so you aren't always updating the tables of IP addresses showing where all your machines are.

I wouldn't necessarily say that cloud computing makes it easier for developers. Cloud providers merely remove the need to worry about physical hardware, and instead of waiting days for a new server to be available, it's up and running in minutes. But that's where they stop. You still have to manage the process of loading, distributing, backup, and so on.

Some developers fear that if cloud offerings are proprietary and vendor-dependent, we could have another computing monopoly in control of the applications and data. One alternative is to have an "open cloud ecosystem" in which users could move their data back and forth between open cloud services.

Proprietary systems aren't all evil. Take, for example, the rise and popularity of the iPhone platform. Developers are more than happy to code for one specific vendor and platform with no complaints whatsoever. It really has to do with the services a vendor offers that dictates whether the customer will choose them.

Microsoft's Azure platform and Google's App Engine are two prime examples of proprietary cloud offerings. Once you deploy code to either, you've cut off any portability avenues. For some, this is pure evil, while others don't mind.

So will there be a dominant player that emerges? Too early to say, but all the big players have their vested interests in gathering up as many users as possible. Ultimately, it will be the users that will decide.

Read the full interview with Alan Williamson


Photo of Kirk Pepperdine
Kirk Pepperdine
 

Kirk Pepperdine: We Need Alternative Techniques

A Java Champion since September 2005, Kirk Pepperdine is a primary contributor and consultant to javaperformancetuning.com, which is widely regarded as the premier site for information about Java performance tuning, and the co-author of Ant Developer's Handbook. He has been heavily involved in application performance since the beginning of his programming career and has tuned applications in a variety of languages: Cray Assembler, C, Smalltalk, and, since 1996, the Java programming language. He is currently an independent consultant and an advisor to theserverside.com.

Cloud computing doesn't look that different from large data centers, which have been around for years. While there are some problems that are peculiar to cloud computing today, I think these problems will be resolved. Virtualization of the network is problematic, but people are working hard to find ways to eliminate the problem.

In terms of performance tuning, the tool sets are disappointingly poor for doing diagnostic work in cloud environments or large deployment environments. We need alternative techniques for figuring out what is going on in the system and how to make things more scalable. I'm confident that we will develop those techniques.

All the variability in cloud computing makes it more difficult to define performance problems. You can't go to your manager and say that you will find the problem by Tuesday and fix it by Thursday PH*PH*PH\uFFFDPH*PH*PH which is possible with many other problems. Memory leaks, for instance, are easy to find outside of the cloud, so we can be confident about finding them relatively quickly.

Read the full interview with Kirk Pepperdine.

See Also


The Developer Insight Series, Part 1: Write Dumb Code -- Advice From Four Leading Java Developers
The Developer Insight Series, Part 2: Code Talk
The Developer Insight Series, Part 3: The Process of Writing Code
The Developer Insight Series, Part 4: Favorite and Funny Code

Rate This Article
 
 
Comments

Do you have amusing code or a funny software story to share? We'd love to see it. We welcome your participation in our community. Please keep your comments civil and on point. You may optionally provide your email address to be notified of replies--your information is not used for any other purpose.