As Published In

Oracle Magazine
July/August 2004
Feature

Actualizing Grid Management
By Alan Joch

Tying resources together in a computing grid poses many challenges, but new software from Oracle helps you manage these resources as one.

By tying together massive amounts of server, application, and storage components into a collective whole, enterprise computing grids can create cost-effective virtual "computers" of unmatched power and speed, where the negative effects of spikes in demand become a thing of the past.

But along with this power come unique challenges in harnessing it: Without effective management tools, grids not only squander their potential but can actually inflict new cost burdens on the organization. "Sharing resources for maximum efficiency is difficult," says Kevin Oltendorf, senior director for partner alliances at Fujitsu Software Corp., in Sunnyvale, California, which is currently investigating grid technology. "When a company has a large, distributed resource pool, it's even harder to do it correctly."

Control Necessary to Realize Potential

The potential of enterprise computing grids is clear. Large clusters of commodity-class servers running economical versions of Linux can give processor-hungry organizations almost unlimited scalability without breaking IT budgets. Technology researcher Gartner Inc. estimates that "the typical enterprise, with mainframe, UNIX, and Windows deployments, could save between 8.5 percent and 10.5 percent of the data center budget by implementing standardization, automation, and consolidation" (Gartner Research, July 2003). Cost advantages like these help explain why grid computing is top of mind for many large, multinational corporations, growing Web-based businesses, universities, and research facilities.

In addition, because computing grids are built using low-cost modular components, a company can start small and preserve its initial technology investment as its business grows.

But the challenge of effectively managing these complex environments remains a concern, grid experts say. It's one thing to have an almost limitless supply of computing resources within reach; it's another to keep all the gears turning properly. And of course, automating grid management is critical to its success. "Just the number of moving parts and the dependence that they have on one another means that grids are too complex to manage with the fairly manual tools that we've historically used," says Mary Turner, vice president of Boston-based consultancy Summit Strategies.

For grids to be successful, applications and infrastructure resources must be reconfigured on the fly according to business needs and fluctuations in demand for computing resources. Sometimes these fluctuations are predictable, sometimes they're not—like when a critical hardware component fails and everyone is left scrambling for processing power. "To consistently deliver the service levels a business needs, grid resources must be reallocated on an agile basis," Turner says. "The more you reconfigure something in storage equipment, or in a server, a database, or an application, the greater chance there is that you'll affect the end-user experience. Relying on people and manual processes to keep up with it all just doesn't scale" in a grid environment.

Another gotcha: The hoped-for grid cost savings quickly disappears if management becomes a resource burden. "The trick is to roll out this highly available, scalable, low-cost grid infrastructure and manage it at no additional cost," says Jay Rossiter, Oracle vice president of System Management Products. "The last thing you want is to have your hardware costs go down only to be replaced by increased labor costs."

The multinational pharmaceutical firm Merck & Co., Inc., headquartered in Whitehouse Station, New Jersey, is well aware of these possibilities. It's taking a careful approach to grid computing as it tries to determine if the technology makes business sense for its operations. Merck runs more than 1,300 Oracle databases worldwide, with large data centers in the U.S., Brussels, and Singapore, and one thing is clear: Before Merck commits to grid, it wants to be sure management tools are ready for prime time. "That's definitely one of our key considerations," says Ronald Ruel, director of information management. "We know firsthand what sort of a burden is introduced by managing the number of databases that we currently have in place. One of the costs we're looking at is what it will really take to manage a grid."

Grid pioneers say management tools must multitask to the max by providing automated monitoring, diagnostics, performance-tuning, and administration. "Administrators do not have the time to do this manually in a large distributed environment," says Fujitsu's Oltendorf. "Administrators need to automatically monitor response times from a customer's standpoint and be alerted when prespecified service levels aren't met."

Intelligent Grid Management

In the Oracle environment, the responsibilities for maintaining complex grids naturally fall within the realm of Oracle Enterprise Manager 10g (EM) with its new Grid Control functionality. The Oracle database and application server help manage individual instances of these products through self-managing capabilities introduced in their newly released 10g versions.

The latest version of the Oracle database includes a built-in intelligent management infrastructure that monitors and diagnoses internal performance and availability. "There's no entity better than the database itself to really know that information and be able to act on it," says Moe Fardoost, product marketing director in Oracle Technology Marketing. "[The database's internal manager] knows exactly what's going on, which SQL statements are consuming the most resources, where the bottlenecks are, and how resources such as storage and memory are being used."

Similar management capabilities reside within Oracle Application Server, and together these automation building blocks enable Grid Control to simplify the management of clustered, midtier grid environments.

Grid Control provides extensive middle-tier management and monitoring capabilities in one integrated tool that spans the entire grid environment. Its capabilities include multisystem management, provisioning and configuration management, automated administration, policy-driven standardization across sets of systems, and end-to-end diagnostics.

Grid Control views the availability and performance of the grid infrastructure as a unified whole rather than as isolated storage units, databases, and application servers. That means you can group hardware nodes, databases, and application servers into single logical entities and manage a group of targets as one unit. Grid Control provides a simplified, centralized management framework for managing enterprise resources and analyzing a grid's performance. Grid administrators can manage the complete grid environment via a Web browser throughout the entire system's software lifecycle, front to back, from any network location.
Oracle Enterprise Manager Enables Grid
Oracle Enterprise Manager Enables Grid in 5 Key Areas

In terms of its monitoring capabilities, Grid Control provides administrators with proactive tools, letting them create representative transactions that give them a window into actual performance across the grid. "We can specify that the transaction should take an average of three seconds, but if it approaches five seconds, I want my administrator to receive an automatic warning so he can figure out what's going wrong," explains Martin Peña, Oracle director of Product Management. "If the transaction approaches seven seconds, then I want a critical alert, which will make sure everyone stops what they're doing and takes a look at the problem." The alerts are based on the application's actual performance, not just that of individual components such as the database, application server, HTTP server, or network routers.

A key attribute of Grid Control is that it's designed for multitier, heterogeneous environments with an ability to reach across all the tiers of resources that affect the environment, says Turner. "The out-of-the-box automation templates and workflows are important for helping customers see real benefits, particularly in the area of configuration automation, which is just an incredibly error-prone area."

Enterprise Manager in Action

MCI, in Ashburn, Virginia, hosts internet-based applications for enterprise customers. Its hosting delivery model depends on all of the factors that make grid appealing: efficient scalability, fast performance, and bullet-proof uptime. Custom scripts and proprietary monitoring tools helped MCI Enterprise Hosting, a Digex Service, manage its IT infrastructure in its early days, but for the past three years, it needed more. Scripts required too much manual operation, which made the process error-prone and difficult to scale. Instead, MCI turned to an earlier version of EM and used its systems diagnostic tools to identify and resolve problems before they affected the system. The result: Response times for customers improved, and each of MCI's DBAs can now manage 40 databases—twice the industry average.

Based on its early success, MCI became one of the first users of Grid Control last year, hoping for an additional boost in customer service and DBA productivity. "The key criterion for customers is performance," says Mark Cross, operations director at Digex. "Not meeting a service-level agreement costs us money—but it could cost our customers many times that amount in lost business. We need tools that work, are proactive, and let us manage our customers' systems better than they could themselves—and better than our competitors can."

MCI was attracted to Grid Control because it could simultaneously manage the entire technology stack without lowering performance. The tool's advanced performance metrics let MCI diagnose problems faster than the previous EM version did. Also, Grid Control self-configuring database agents automatically monitor events using pre-set Oracle metrics that correspond to the ones most frequently used by DBAs.

According to Chuck Wolfe, senior principal DBA of UNIX technical operations at MCI, "In previous versions, we had to define performance metrics and thresholds ourselves. With this new version, rollout was faster and had a minimal impact on our operations. It enabled us to maintain our usual service levels with no disruption to customers."

Grid Control's easy-to-use Web-based management console means DBAs can control their systems from any location using a standard browser. DBAs can drill down directly from the HTML interface to get a complete view of performance at every level, and new diagnostics automatically detect bottlenecks, thanks to Grid Control's "health advisories," which also suggest solutions. According to MCI, Grid Control lets the company resolve system performance problems in an average of 15 minutes versus an hour in the past, and reduce customer response time by 75 percent.

With its army of system-monitoring agents, Grid Control automatically inventories resources and tracks database metrics; this lets MCI DBAs compare configuration differences between two databases or hosts if a performance glitch arises. Grid Control also downloads system patches from MetaLink, an Oracle Support service. "Oracle Enterprise Manager 10g Grid Control lets us access the configuration management agent from the same interface we use to diagnose database problems," Wolfe says. This reduces the time it would otherwise take to log onto the machine and go through the standard operating system utilities.

Grid Control goes beyond keeping track of system configurations and patch sets; it can clone a system, helping you roll out a new copy of an existing application server or database. Built-in Grid Control monitoring features also make it possible for junior data-center staff to take on some DBA tasks, says Lee Harris, senior manager of UNIX technical operations at MCI Enterprise Hosting. "This will allow us to increase the number of databases managed by each DBA from 40 to 68—an increase of 70 percent—and to grow our business without increasing our resources."

Early Reviews

At Fujitsu, Kevin Oltendorf's department runs more than 50 servers used for porting, developing, and testing the Solaris Operating System on Fujitsu PRIMEPOWER servers. It also supports Web sites and FTP servers for PRIMEPOWER alliance and support activities.
Next Steps

VISIT the OTN Grid Technology Center

LEARN how to use Oracle Grid Control

DISCUSS Oracle Grid Control

Oltendorf's group has evaluated Grid Control and Oracle Database 10g not for its own use but for the potential to integrate them with Fujitsu's TRIOLE technology, an IT architecture for stitching together servers, storage, networking components, and middleware into a cohesive whole. An early example of TRIOLE, Fujitsu's SysFrame, used Oracle9i Real Application Clusters (RAC) to provide clustering support to this grid-in-a-box solution. According to Oracle's Trish McGonigle, director of System Management Products, this is a perfect example of how Grid Control is designed to manage grid-specific solutions completely and logically. "Grid Control provides a single point for configuring and managing clusters," she says.

Oltendorf also gives Grid Control high marks for its ability to monitor response times from the customer's standpoint, which can be tied to service levels. "We found the initial EM screens to be impressive, having a layered and intuitive presentation of the systems being monitored. This lets us effectively drill down into a database or an operating system, or for hardware specifics."

The ability to automate routine database administrator activities, like space allocation and backup routines, also stood out for Fujitsu. "EM does a great job monitoring the end-user experience, the database, and the upper-level Oracle application stack," says Oltendorf.

Oltendorf believes that Grid Control, paired with Oracle RAC 10g, will give Fujitsu customers an easier out-of-the-box grid management tool. "With additional integration of systems management tools, EM could be used for monitoring down to the hardware level. This would give administrators a more complete, integrated view of all of their system resources, including hardware diagnostics."

Sea Change

Summit Strategies' Mary Turner believes that the rise of sophisticated grid management tools such as Grid Control may eventually reshape IT departments. "Some early users are finding that they can scale their management capabilities much more broadly than ever and get a much better ratio of administrators to systems," she explains. "You can actually change the cost structure of your administrative resource mix and hopefully get greater use of your infrastructure resources."

Grid Glossary

Autonomic Computing
The capability of computer systems and networks to automatically configure themselves to changing conditions and heal themselves in the event of failure. Autonomy implies that less human intervention is required for operation under such conditions.

Blades
A computing system that includes processors and memory on a single board, but where other resources such as power, cooling, network access, and storage services are shared. Blades are designed to be easily installed and removed and are typically smaller than rack-optimized servers.

Capacity on Demand
Processing power that is available as needed in a timely manner without disrupting other business priorities. Capacity on demand frequently involves additional capacity installed but not available for use until needed.

Clustering
Connecting two or more computers together in such a way that they appear to be a single computing resource. Clustering is used for parallel processing, load balancing, and fault tolerance. Clustering is a popular strategy for implementing grid computing, since it is relatively easy to add new CPUs simply by adding a new server or blade to the overall cluster. Clusters are typically transparent to users and applications.

Data Center
A facility that provides a suitable environment (power, cooling, network connectivity, management services) for housing information technology equipment (servers, storage) and providing IT services and support to customers.

Data Provisioning
Making data available when and where it's needed and as it becomes available.

Distributed Computing
Multiple computing resources networked and used together to solve a computing task.

Global Grid Forum
The standards body for defining standard specifications for global grids. www.gridforum.org

Globus Alliance
A group that conducts research and development for academic grids. The alliance, creators of the Globus Toolkit, is based at Argonne National Laboratory, the University of Southern California Information Sciences Institute, the University of Chicago, the University of Edinburgh, and the Swedish Center for Parallel Computers. www.globus.org

Globus Toolkit
A kit designed by the Globus Alliance to provide a set of tools based on standard grid APIs. Its latest development version, GT3, is based on standards currently being drafted by the Global Grid Forum.

Grid
Computational components—servers, networks, storage, and information—acting together to create one or more large pools of computing resources. A grid can be dynamically provisioned on demand to various applications and users, allowing organizations to dynamically align their IT resources to their business needs.

Grid Computing
A computing architecture that provides computing resources using many computers acting as one virtual computing resource. On the client side, grid computing provides shared resources, allowing complete transparency as to where and how a task is performed. On the server side, grid computing enables enterprises to provision resources to respond to client requests.

Load Balancing
Tuning of a computer system to obtain an even distribution of data and processing across the computing resources.

Node
A network processing location in a grid. A node can be a computer, a set of clustered blades, or some other device, such as a printer.

Pooling
Combining separate computing resources into a single logical group.

Provisioning
Providing or allocating the requested computing resource.

SLA (Service-Level Agreement)
The agreement between a user and a computing service provider to determine the type, capacity, and quality of service. SLAs are used by vendors and customers as well as internally by IT shops and their end users. They can specify, for example, bandwidth availability, response times for routine and ad hoc queries, response time for problem resolution (network down, machine failure, and so on), and steps to be taken in the event of problems, with penalties for noncompliance.

SMP (Symmetric Multiprocessing)
A computer architecture that utilizes multiple CPUs to complete individual processes simultaneously. Any idle processor can be assigned any task, and additional CPUs can be added to improve performance and handle increased loads.

Utility Computing
A pay-as-you-go model of computing. Instead of paying for computer resources to handle the peak load at all times, you pay only for the computing you use; analogous to an electric utility.

Virtualization
Allows interacting with a resource using an abstract mechanism so that the underlying physical resource can be replaced with another one of similar capability without affecting the resource consumer. Virtualization balances supply and demand by providing a transparent, aggregated computing resource.

Web Service
A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.

Workload Balancing
Distributing the workload across multiple systems to optimize system usage and response time for the user.

What's in Grid Control

Most of the features of Grid Control come free to Oracle customers. The product is included in the Oracle product CD packs.

Grid Control Repository: The central repository acts as a systemwide knowledge center with information from multiple systems rolled into a single, enterprise-level view. A set of management services within the repository creates a snapshot of the entire grid infrastructure. This snapshot includes an inventory of all application software and operating systems, all the versions and individual software configurations, as well as all the hardware that's being run. By default, the resource inventory is updated daily, or users can click on a command to refresh the data at will.

With an inventory snapshot in the Grid Control Repository, administrators can then search across the grid to compare individual system components with each other, as well as how the configurations of stable databases and applications differ from those of shakier ones.

The Grid Control Repository also stores a trove of diagnostic and performance information. DBAs can drill down into an application and look for and analyze a performance anomaly that occurred at a certain point in time by correlating performance of hardware devices, application servers, and databases.

Management Console: This console is the "central command center" for viewing and managing your entire grid environment. The console provides holistic, aggregated status views of the enterprise as well as the individual systems within the enterprise. Within the console, users can organize, monitor, and manage their systems logically to meet individual needs of a specific environment. The console is designed to help you understand quickly the status of your entire system by presenting end-to-end information across the enterprise and showing the availability status of the entire enterprise in a pie-chart format. You can select separate availability views for databases and application servers.

Performance Agents: The lightweight software programs install themselves on every server in the grid environment and regularly communicate system status reports back to the central repository. The agents report scores of metrics on database and application server performance. Other metrics record information about utilization levels of CPU, memory, storage, and other resources.


Alan Joch (ajoch@monad.net) is a New England-based technology writer specializing in enterprise and internet applications.


Please rate this document:

Excellent Good Average Below Average Poor


Send us your comments

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy