|
Feature
Put Some Muscle in Your Business
By David A. Kelly
Demands for more data storage, Improved processing, and higher customer satisfaction drive the rapid growth of very large databases.
For scientific and technical publishing giant Elsevier, building the second-largest UNIX-based online transaction processing (OLTP) database wasn't about the technology achievement or awards it might win. It was all about serving the customer better.
|
Surveying the Scalability Landscape
A quantitative measure of the growth in large database deployments comes from Winter Corporation, a leading center of expertise in database scalability. Winter periodically surveys the frontier of database scalability; the most recent survey was completed in August 2005.
"Between this survey and the last one two years ago, the size of the largest database [that we] measured approximately tripled, and the same thing happened with the survey before. It's a very striking growth rate," says Richard Winter, president, Winter Corporation.
The survey noted three significant changes. First, databases are getting largersignificantly larger than customers projected in previous surveys. "Not only are databases getting larger on traditional platforms such as mainframes and UNIX platforms, but in this survey we really saw them getting larger on Windows and Linux as well," notes Robert Dorin, senior research director, Winter.
The second significant change: Databases have broader application usage. "The survey also showed that large databases are being used over a broader and more strategic range of problems than ever before," says Winter.
Finally, there's been considerable growth in Linux deployments. "While I would characterize what we saw in terms of very large databases on Linux as early adopter usage, it foretells the increasing presence of Linux in years to come," says Dorin. "We expect that in the next survey we will see even significantly more Linux presence than we saw in this survey."
|
"Thanks to Oracle's technology and consultancy, we have increased the efficiency in our production process enormously," says Jan Visser, director of Electronic Production, Elsevier. Elsevier's Oracle Real Application Clusters-based OLTP "Electronic Warehouse" houses all the articles for the company's more than 1,800 journals and associated online publications. Already large at about 11TB of information, the Electronic Warehouse continues to grow at about 1,000 additional articles per day and .5TB to 1TB per year. "With the Electronic Warehouse, we've not only improved the quality and efficiency of our products and the production process, but our clients receive their products more quickly," Visser says. According to experts, Elsevier's example is just the tip of the iceberg when it comes to very large databases (VLDBs).
"We estimate that more than 2,000 sites worldwide support a multiterabyte database in production, and we estimate that the number is likely to double over the next 18 months," says Noel Yuhanna, senior analyst, Forrester Research. "There's certainly a tremendous growth where customers want to store data longer than ever before."
Business Need
For many companies the question is not really whether they want a VLDBsuch databases are quickly becoming a fact of business life. "Companies have to store increasing amounts of data not only to maintain a competitive advantage but also to comply with new financial regulations," says Bill Nee, senior director of Product Marketing, Oracle. "We've already seen that the amount of general information being produced is dramatically increasing. For businesses, this problem is compounded because of the need to organize and link together this information in a logical fashion. It also has to be made continually available and easily accessible. Only a database management system can do this effectively."
Of course, having a successful VLDB today takes a different focus than the "consolidate-your-data-at-any-cost" approach that some companies took in the past. "In general, there's a set of requirements that we see many companies considering as they build out larger databases and move toward very large databases," says Nee. "Low cost, scalability, performance, and availability are increasingly critical design considerations."
Oracle provides a variety of answers to those design challenges, from Oracle Real Application Clusters (for increasing scalability and availability) to parallel queries and automatic storage management (for increasing performance) to partitioning (for performance and data management). "One of the key features for Oracle in supporting large databases is the partitioning feature, where you can break down large tables into smaller, virtual tables and make them easier to manage," says Forrester's Yuhanna. "Data partitioning has really helped database administrators to manage very large databases more efficiently."
Oracle's support for large database requirements has not only helped customers such as Elsevier capture the business benefits of deploying large databases, but it has also helped to make Oracle an important player in top database deployments.
|
Managing VLDB Projects
Succeeding in a VLDB project means knowing how to design an efficient solution as well as how to keep it up-to-date. Just ask Dave Ahnell, technical architect, EDS.
EDS helped a client replace its more-than-20-year-old mainframe claim review system with a new Oracle-based data warehouse and data mart solution. The system makes six years of claims data available in real time to users across the state to support the identification of providers and recipients who may be abusing the program.
The data warehouse and data mart are each about 1.9TB in total database size and growing, with 1.3 billion rows in each of the largest tables. The data warehouse has approximately 70 base tables, while the primary data mart has about 22 base tables in a star schema design. The load volume is collectively about 20 million additional claim records per week into the data warehouse and data mart, amounting to about a 20GB increase in storage volume per month.
EDS's Recommendations for a Successful VLDB
Ahnell offers the following suggestions for companies implementing a VLDB:
- Consider splitting logical tables to reduce size and improve load times and performance. "Partitioning, rather than manually splitting the schema, could be a better option to achieve this goal, since it obviates the need of any application modification resulting from such schema changes," says Oracle's Sushil Kumar, director, Database Availability, Manageability, and Performance Product Management.
- Consider tablespace and table distribution across multiple disk subsystems.
- Control large "runaway" ad hoc queries by implementing user profiles or Oracle Database 10g Resource Manager.
- Allow a large enough maintenance window for data load processes and partition maintenance.
- Test with an adequate volume of data to allow for proper testing and query tuning. For example, EDS was able to load one full year of volume for its testing.
- Realize that database maintenance time frames will grow as data volumes increase with VLDBs.
- Query performance can be much slower in VLDBs than in a typical OLTP system due to the ad hoc nature of the queries, so it's important to control and monitor ad hoc database queries closely.
|
"Oracle has become a significant presence on the very large database scene," says Richard Winter, president, Winter Corporation, a research firm that recently completed a survey of VLDB deployments. "We have 42 top ten lists ranking different aspects on our Web site, and in 14 of those lists Oracle displays a dominant presence. Oracle itself has grown from a modest position on the very large database scene to a robust position," notes Winter.
In addition, organizations are now looking for more platform choices than just mainframes and UNIX. "An interesting finding about Oracle's results is that not only did it have a strong presence in our survey, but it had a strong presence across platforms, including at least one No. 1 position in a top ten list on each of the UNIX, Linux, and Windows platforms," says Robert Dorin, senior research director, Winter. "It's also important to mention Oracle's strong showing in both OLTP and data warehousing applications."
Performance and availability were key driving factors in the construction of Elsevier's Electronic Warehouse, which was first deployed in 2003 and designed around Oracle Database with partitioning and the Oracle Workflow Management system. The core of Elsevier's business is to publish scientific journals and books, both electronically and through traditional paper methods. The process starts with authors submitting articles and ends with the publication of the reviewed articles either in paper publications or in one of Elsevier's online services, of which ScienceDirect is the largest. In between, Elsevier has to manage the enrichment of the articleseverything from the corrections to peer reviews, artwork, and production. The Electronic Warehouse includes the core articles in XML format, PDF files of typeset articles, and all the artwork in high-quality formats.
Elsevier runs the Electronic Warehouse using Oracle Real Application Clusters on three Sun servers with nine other application, drop-zone, and bucket servers serving supporting functions. "We went with Oracle Real Application Clusters because we needed to have 24/7 availability for our production process and still be able to upgrade, update, or handle failures without the production services going down," says Warry Spykstra, senior project manager, Elsevier. "It's been very beneficial for us because we've been able to handle system outages. The design has also saved us money because we can have several smaller servers instead of one or two larger ones."
Performance was also critical to Elsevier, since the solution is their core production system. "Fifty percent of the success of any new system is in its performance. Early on, Oracle did a good job in keeping the issue of performance high on our agenda for the project, so that we not only concentrated on the functional side but also on the performance side, which is a real issue when the database is very large," says Visser. "I can remember long discussions about the partitioning that we had very early on in the project. I would recommend that other organizations deploying large databases make sure to keep performance requirements high on the agenda during design and development."
The impact of the Oracle-based Electronic Warehouse on Elsevier has been direct. "By having this system, we were able to make our IT services and resources more efficient," concludes Visser.
Billions and Billions Served
Another company that has leveraged Oracle's partitioning feature to speed the performance of a VLDB is SBC, with its Billing Products Initiative data marts that are designed to provide customers with alternatives to conventional, one-size-fits-all telephone bills. "We put all the detailed billing information into data marts, and then the customers use specialized applications to interface with them to access their billing information, do analysis, or organize their accounts," says Mike Ogden, senior technical architect, SBC. "This provides us the most effective solution and it enables us to allow customers to gain access to all of their detailed billing records online, which will provide big savings for us over time. We're quite happy with this application."
As with any telephone billing application, the volume of records that SBC handles is significant, with one fact table containing two billion records, and another, eight billion. The 1.5TB data mart keeps a rolling 18 months of detailed billing records and is growing by a million transactions (or 10 percent) per month.
|
Snapshots
Elsevier Publishing
www.elsevier.com
Industry: Publishing
Location: Amsterdam
Oracle products: Oracle Database Enterprise Edition, Oracle Real Application Clusters, Oracle Partitioning, Oracle Text
Database size: More than 11TB
SBC
www.sbc.com
Industry: Telecommunications
Location: St. Louis, Missouri
Oracle products: Oracle Database 10g Enterprise Edition
Database size: 1.5TB
UPS
www.ups.com
Industry: Package delivery
Location: Mahwah, New Jersey
Oracle products: Oracle Database Enterprise Edition
Database size: 4TB
EDS
www.eds.com
Industry: Technology
Location: Plano, Texas
Oracle products: Oracle Database10g Enterprise Edition, Oracle Partitioning
Database size: Approximately 4TB
|
To cope with this size, SBC has deployed Oracle Database 10g's partitioning features. "Since these applications are really data marts for individual customers, the queries are very partitionable," says Ogden. "With Oracle partitioning, we were able to leverage that and do a lot of partition elimination for those queries." Partitions are typically based on an account or customer ID and date, so a customer that wants to look at a specific bill can query a single partition instead of the hundreds or thousands of partitions that exist in the table.
Ogden recommends that organizations that work with VLDBs consider the importance of developing a partitioning strategy. The right design can increase performance and ease administration. "Since we're using partitioning, it allows us to easily purge data by simply removing partitions from the table as data ages, as opposed to trying to do massive data deletes or creating new tables," says Ogden.
"Performance is definitely something that requires a lot of attention. We leveraged the Oracle SQL Tuning Advisor and Oracle's Enterprise Manager Grid Control to tune queries and add some indexes before we went into production," says Ogden. "That was a big benefit for us."
Oracle's Nee says that he has started to see a trend. "SBC's situation, where the company has a large-scale data mart with an OLTP workload, is a good example of the convergence we're seeing," says Nee. "For many companies, the demand for near real-time analysis of information, or what's sometimes called 'active' data warehousing, is beginning to blur the line between OLTP and data warehousing solutions."
SBC is extending its solution by deploying additional, substantially bigger data marts in 2006, as well as implementing Oracle Real Application Clusters to provide higher availability and the ability to dynamically scale out server capacity on demand.
UPS Speeds Delivery
For a package delivery company such as UPS, the only business is ensuring that packages get to where they're going on time. That's why more than 16,000 employees access UPS's specialized 4TB Oracle-based package reporting repository to manage delivery issues.
"Our Service Exception Analysis System allows employees to look at any of the exceptions where a package didn't make service," says John Maldonato, project leader of Oracle DBAs at UPS. The data warehouse is loaded from a mainframe once a day through a batch load from the previous day's information.
"One of our biggest challenges with the solution has been getting the performance to an acceptable level for our users," says Maldonato. "One thing that makes this different from other decision support systems is that it has a Web-based front end, with some very tight SLAs [service-level agreements]."
The solution, which came in fourth in the Winter survey for transactions for peak workload in the UNIX decision-support system (DSS) category, has a peak workload of 2.9 million SQL statements per hour. "[We survey highest] for a peak workload data warehouse on UNIX for a centralized SMP [symmetric multiprocessing] system," says Maldonato. "Right now we average 10.3 seconds per user query, which is well under the limit of 15 seconds for a query submitted through a Web site."
One way that UPS delivers that fast response time is by leveraging Oracle features. "We rely very heavily on materialized views and the query rewrite functionality in Oracle Database to be able to support the level of query performance that we have today," says Maldonato. "We end up creating a materialized view based on the query history. This results in cutting down the amount of data the users need to go through to get their result set."
The demand for timely and accurate information"actionable intelligence"is more important than ever before, and IT departments are managing data volumes that were once the stuff of science fiction. The goal for many large organizations whose mission-critical decisions are based on database performance, reliability, and scalability is to have accurate data they can use now. The future of their business depends on it.
David A. Kelly (dkelly@upsideresearch.com) is a business, technology, and travel writer who lives in West Newton, Massachusetts.
|