As Published In

Oracle Magazine
July/August 2005
Advertisement




Feature

Try New Linux Clusters
By David Baum

Linux clusters are emerging as a popular choice for high-performance, low-cost data warehousing solutions.

As more and more companies weave business intelligence into the operational fabric of their businesses, they are looking at low-cost options for deploying their ever-growing data warehouses. Traditionally, large data warehouses were deployed on high-end symmetric multiprocessing (SMP) computers that didn't necessarily need to meet the same availability requirements as transaction processing systems. Today, as companies extend their data warehouse assets to more users—both inside and outside the organization—there is a critical need to control IT costs and deliver higher service levels. That's why many companies are choosing to deploy their data warehouses on clustered low-cost commodity servers running Linux.

Enter Linux Power

Is Linux ready for these high-end data warehouses and business intelligence systems?

IT pros at Vanderbilt University think so. Until recently, this Nashville, Tennessee-based educational institution struggled with the cost of managing its growing information systems. Vanderbilt discovered that running Oracle Database with Oracle Real Application Clusters on Linux would allow it to use low-cost Intel-based hardware. The economy of the solution was compelling: A two-node RISC-based server configuration would have cost $100,000, whereas a two-node Intel-based solution cost only $30,000. "Our tests showed that we would get three times the server power and performance for the dollar, plus greater availability, if we switched from UNIX to Linux," reports Tim Getsay, assistant vice chancellor of Vanderbilt's management information systems.

Today, Vanderbilt uses Oracle Database 10g with Oracle Real Application Clusters configured on 16 HP ProLiant DL580 servers running Red Hat Enterprise Linux. Oracle Real Application Clusters makes it easy to scale the data warehouse, because low-cost servers can incrementally be added to the cluster. According to Getsay, Vanderbilt expects to add 20 processors per year as it scales its data warehouse to several terabytes.

"It's not just the operating system but also the overall cost of commodity components that is driving down costs in these data warehouse installations," says Lou Agosta, an analyst at Cambridge, Massachusetts-based Forrester Research. "The operating system is only about five percent of the price of the overall configuration and often less than one percent. It is not Linux, per se, that is driving down costs for Oracle data warehouse customers, but all of the things Oracle is doing with Linux in grid computing environments to create low-cost systems."

The Linux Cluster Convergence

Apart from costs, Agosta believes that the adoption of open source operating systems such as Linux is being driven by a variety of factors. "For one thing, large vendors such as IBM, HP, Oracle, and Dell are getting behind Linux," he says. "Second, many people want a low-cost, nonproprietary alternative to Windows. And, finally, because Linux runs on commodity components, it allows you to avoid technology lock-in."

Oracle's commitment to Linux is part of the reason for its widespread adoption in the enterprise today. Oracle continues to work closely with Red Hat, Novell, and the Linux community to ensure that Oracle products and the Linux kernel are optimally configured and tuned to the underlying hardware, and Oracle provides seamless and integrated 24/7 customer support for Linux. These business dynamics motivated Vanderbilt to deploy an enterprise grid that can manage all of the university's core databases on a centralized infrastructure, consolidating the data warehouse environments for both Vanderbilt University and the Vanderbilt Medical Center. Vanderbilt's Linux-based information systems keep data secure yet highly available to all authorized personnel. If any server in a cluster fails, the remaining servers continue to operate seamlessly, ensuring 24/7 availability. According to Mainstay Partners, an IT consulting firm based in Redwood City, California, this infrastructure affords Vanderbilt University the benefit of US$6.2 million in cost avoidance for hardware investments and hardware maintenance as the university continues to scale out its grid with standardized, commodity-priced servers and storage devices.

Growing with Users
Snapshots

VANDERBILT UNIVERSITY
www.vanderbilt.edu

Vanderbilt University is a leading educational institution based in Nashville, Tennessee, with an enrollment of more than 11,000 students. Founded in 1873, Vanderbilt University comprises 10 schools, a public policy institute, a distinguished medical center, and the Freedom Forum First Amendment Center. Vanderbilt is the largest private employer in Middle Tennessee.

Industry: Education
Employees: 18,551 (including medical center staff)
Oracle products and services: Oracle Application Server; Oracle Database, including Oracle Partitioning, Oracle Enterprise Manager Grid Control, Oracle Real Application Clusters

MLT VACATIONS
www.worryfreevacations.com

Based in the Minneapolis/St. Paul, Minnesota, metropolitan area, MLT Vacations is a wholly owned subsidiary of Northwest Airlines Corporation and one of the largest providers of vacation packages in the United States. Each year, more than one million people purchase trips from MLT Vacations' two brands: NWA WorldVacations and Worry-Free Vacations.

Industry: Travel
Employees: 300
Oracle products and services: Oracle Database, Oracle Real Application Clusters

While better reliability at a lower cost is a benefit any IT manager can appreciate, there is another reason companies are turning to the performance and scalability of Linux clusters for data warehousing. The growing popularity of end-user reporting signals a change in how data warehouses are being used. Traditionally, these systems were utilized by a small group of information analysts, and possibly some senior executives and line-of-business managers. Today's business intelligence solutions typically involve extending business information to many different types of employees throughout the enterprise—as well as to external partners and customers. These operationally focused business intelligence systems influence a lot of the core information systems of the company. They are used for both strategic and tactical decisions—not only by professional analysts and power users but also by rank-and-file employees throughout the organization.

"Our users are becoming more independent," confirms Ron Reinsma, manager of the applications group at MLT Vacations, one of the largest providers of vacation packages in the United States. "They want to dig into the data and get their own answers. It's a challenge for us to keep up with that demand, because they keep asking more-complex questions. We're seeing 10 percent annual growth just from the new report requests and data structures we're building."

Oracle Real Application Clusters enables companies such as MLT Vacations to scale their information systems to support changing business demands and to create an infrastructure with built-in high availability and business continuity. According to Chris Corona, manager of system services at MLT Vacations, these clustered environments gracefully handle unscheduled outages by automatically recovering a failed server and continuing to provide database services by using surviving servers. Data is always accessible—as long as there is at least one server running in the cluster. This resilient configuration enables MLT Vacations' Web site, reservation systems, and data warehouse applications to remain online during routine maintenance or when there is a problem with a server.

"This stability is becoming increasingly critical, not just for our transaction processing systems but also for our data warehouse," says Corona. "Many users depend on the warehouse to analyze revenue, inventory, and pricing data as well as to track customer issues, credit vouchers, and profitability."

System Essentials

One of the strategies behind grid computing is to maximize the use of processors and storage capacity. When purchasing computing power, companies typically overestimate the amount they will need—and then pay support and maintenance on that additional capacity. Even if they ultimately end up using all available capacity, it's an expensive way to do business. Oracle Real Application Clusters changes this scenario, allowing companies to scale their information systems incrementally, minimizing capital expenditures by adding server capacity only when needed.

"Oracle data warehouses built with Oracle Real Application Clusters technology have great flexibility, because of their shared-everything architecture," says William Hardie, senior director of Database Product Marketing at Oracle. If a server malfunctions in an Oracle clustered database environment, processing continues on the remaining servers. This ensures that data remains accessible and applications function without interruption. "Plus it's easy to scale database clusters on demand," Hardie adds, "because Oracle Real Application Clusters automatically harnesses the processing power of additional servers as they are brought into the cluster."

Others have seen the wisdom of this way of thinking. "We wanted to be able to easily scale our information systems as our business grows," admits Reinsma. "We were reaching the boundaries of scalability with our SMP server, which meant we would soon need to buy another big box. With Oracle Real Application Clusters, we can add more small servers as we need them."

By replacing its SMP servers with clustered Intel-based servers running Oracle Database and Oracle Real Application Clusters on Linux, MLT Vacations was able to improve system performance while decreasing technology costs. The travel company expects to save approximately US$1 million in software, hardware, training, and maintenance costs over the next five years as a result of its IT investments. "Moving to Oracle on Linux has exceeded our expectations in terms of performance and cost efficiencies," says Michael Kress, director of enterprise technology services at MLT Vacations. "With our SMP server, even though there were multiple processors, they were all tied together, so we couldn't take one down without taking down the others."

Growing with Linux
Tuned for Intelligence

Oracle has spent many years refining the Oracle Database to ensure optimal performance for data warehouse installations. Its key technical capabilities include the following:

Intelligent query optimization. Query optimization improves the performance of a relational database, especially for executing complex SQL statements. A query optimizer determines the best strategy for performing each query. It chooses, for example, whether or not to use indexes for a given query and which join techniques to use when joining multiple tables. These decisions have a tremendous effect on SQL performance. The query optimizer is entirely transparent to the application and end users.

Materialized views. Materialized views were first introduced in Oracle8i. The materialized view can be thought of as a special kind of view that physically exists inside the database. It can contain joins and aggregates, improving query execution time by precalculating expensive joins and aggregation operations prior to execution.

Bitmapped indexing. Bitmapped indexes provide significant savings in index creation time and the space required to store indexes. Oracle's patented bitmap indexes are widely used, particularly in data warehouse applications. Whereas other database vendors provide "dynamic" bitmap indexes, Oracle supports real bitmap indexes (in addition to dynamic bitmap indexes). Real bitmap indexes are index structures in which the compressed bitmap representation of the index is stored in the database, whereas dynamic bitmap indexes convert B-tree index structures in the database into bitmap structures during query processing. Real bitmap indexes provide larger space savings than regular B-tree indexes. These space savings also translate to performance benefits in the form of fewer disk I/Os.

Data partitioning. Partitioning is an important feature for improving complex query performance and easing the management of large volumes of data. Oracle Partitioning offers a choice of range, hash, and list partitioning mechanisms to meet different requirements. For example, range partitioning is a very useful partitioning method where "rolling windows" of data are common. Range partitioning on date ranges can have enormous benefits for the load/drop cycle in a data warehouse, in terms of both efficiency and manageability.

MLT Vacations' data warehouse has played a key role in helping the company determine how to cut costs during slow times and is also essential for forecasting and tracking trends in inventory, pricing, and profitability. "We had a couple of years where we were focused on controlling expenses and rightsizing our business," Kress notes. "Now we're definitely looking ahead to how we can grow the business. Our data warehouse has become a key part of our success."

"Our daily flash report gives managers a snapshot of what yesterday, last week, and last month looked like compared to our budget or compared to last year," Kress continues. "We have inventory reports that show how things are booking this period, compared to last period. I just don't see how we could deliver that information in a timely manner if we didn't have the data warehouse."

To improve query performance and support larger systems, Oracle works closely with OS vendors to recommend changes to the Linux kernel, such as the ability to process database requests asynchronously in the I/O subsystem, support for very large memory capabilities, and the ability to exploit 64-bit architectures to run Linux applications. These capabilities will become progressively more important as Linux customers develop larger data warehouses for greater numbers of users running queries.

"A data warehouse occupying a terabyte of storage—a rarity a few years ago—is now quite common," says Richard Winter, president of the Waltham, Massachusetts-based Winter Corporation, a consulting firm that specializes in the database industry.

Many Oracle customers now have multiterabyte warehouses deployed on clustered Linux platforms. In one instance, a retail customer runs a 16-node data warehouse built with Oracle Real Application Clusters on affordable Intel Xeon machines, low-cost storage arrays, and the Red Hat Enterprise Linux operating system. The data warehouse is more than 23 terabytes in size and continues to double each year in both size and query volume.

Simpler Administration

As data warehouses grow to multiterabyte levels, database management can become increasingly complex. Winter is quick to add that if your architecture cannot support growth, the data warehouse can become unresponsive, unstable, and fundamentally unusable. Oracle's solution to this quandary is to add self-managing and automation features to the server engine. Management of the complete Oracle environment, including databases, is achieved via Oracle Enterprise Manager, featuring Grid Control. According to Oracle's Hardie, this strategy simplifies the complete application lifecycle, from development and deployment to change management, configuration, day-to-day administration, and performance diagnostics. "Grid Control manages and monitors your entire deployment infrastructure, including database, middleware, and storage resources. With automated storage management, automated memory management, and automated backup and recovery and with complexity eliminated from many administrative tasks, customers can easily meet their service-level objectives—even when managing very large volumes of data. In addition, administrators can treat clustered resources as basic management units, automating their start, stop, and monitoring as well as failover, relocation, and restart," he says.

Vanderbilt University uses Oracle Enterprise Manager Grid Control to provision users, clone databases, and install software patches. These tactics are becoming increasingly important, as its data warehouse grows to serve all the university and hospital decision support needs. "Our DBAs perform configuration, high-availability operations, recovery, and monitoring functions just once," Getsay explains. "Oracle Real Application Clusters then automatically distributes the updates to the appropriate nodes. Automated failover functions within Oracle Real Application Clusters eliminate many time-consuming manual processes," Getsay continues. "With Oracle Enterprise Manager Grid Control, managing a 16-node database cluster is a lot easier."

Dollars and Sense

Companies initially looked toward clustered Linux solutions to reduce their IT costs and provide higher availability of their core transactional systems. Now many companies are realizing that the benefits of low-cost Linux clusters are equally applicable to their data warehouses.
Next Steps

DOWNLOAD Oracle Database 10g

LEARN more about
Oracle on Linux
Oracle Real Application Clusters
Oracle Business Intelligence and Data Warehousing

A data warehouse built on a Linux cluster provides enterprise-level performance, scalability, and 24/7 availability at a very low cost. This is one of the reasons why Linux is making impressive inroads into the server market. Forecasting by Gartner Dataquest in December 2004 estimated that Linux will ship on 21.8 percent of worldwide servers by 2008, up from 12.6 percent of worldwide server shipments in 2003.

Oracle's William Hardie concludes, "Today's business intelligence systems are characterized by large and growing data warehouses supporting increasing user populations running ever-more-complex queries. Deploying a clustered Linux solution ensures that customers can easily meet their future growth requirements and guarantee service levels while keeping their cost of computing down. It makes sound economic sense."


David Baum (david@dbaumcomm.com) is a freelance business writer based in Santa Barbara, California.

Send us your comments

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy