|By Janice J. Heiss, January 2007|
|- A Bird's-Eye View|
|- Under the Hood: Learn About FareCompare's Java Architecture|
|- For More Information|
FareCompare.com continually capitalizes on the latest developments in the Java platform to provide consumers with what the company touts as the most updated airfare information available anywhere. This article provides an overview of the company's services, followed by an under-the-hood look at its Java architecture.
In December 2000, developer Rick Seaney, then CEO of FareCompare, a company that prides itself on posting updated airfare information hours ahead of other sites, got a call from his attorney. His lawyer informed him that one of his clients in the travel business was about to have his web site turned off as part of a dispute over ownership. Seaney's skills as a developer might be of value.
Improvements in Java Platform, Standard Edition 6
The founders of FareCompare, who have tried out virtually every new feature of the Java platform since Java 2 Platform, Standard Edition 1.2 (J2SE 1.2), report several improvements as a result of their early access to Java Platform, Standard Edition 6 (Java SE 6):
At the time, Seaney and his business partner, fellow developer Graeme Wallace, now chief technical officer (CTO) of FareCompare, were consulting with a variety of corporate clients through their company, XXI Technologies, and relying on ongoing developments in the Java programming language to solve problems during the still-accelerating Internet boom. Their experience working with Cray and other supercomputers, in conjunction with oil exploration software, had provided apt preparation for the Internet age.
Seaney, whose background was in C++, had to be convinced to use the Java programming language by Wallace, a former C++ developer who had been fascinated by the Java language from its inception. Seaney eventually concluded that the Java language was easier to teach programmers, the garbage collection memory model saved time and hassle, and the back-end enterprise Java applications that were beginning to take hold were more agreeable than CORBA, which was tediously complicated.
Seaney and Wallace began researching the travel industry's software challenges and found themselves consulting with Hotels.com in 2002. At the time, Hotels.com and Expedia were the only companies that owned a local inventory hotel property quoting engine. Hotels.com was in a race with several companies to create a dynamic packaging engine that would allow the company to bundle together air, car rental, and hotel prices and offer real-time quotes to consumers. The winning company could, in effect, offer deeper discounts on airfare as long as the source of the discount was hidden.
"A $200 airfare and $300 hotel room would ordinarily cost $500," explains Seaney, "but by engaging in what is called opaque pricing, they could offer the airfare at $120 by presenting a package price of $420, which provided the airlines a mechanism to discount air fare without eroding a la carte pricing."
Hotels.com wanted to develop a system by which consumers could price a complete package covering airfare, car rental, and hotel room within five seconds -- no mean feat at the time. Unfortunately, a system operating through parallel queries is only as fast as its slowest link. Although the hotel information could be quoted in one second and the car systems in five, the fastest airfare-quoting technology required a minimum of 15 seconds to deliver results.
The primary source of comprehensive fare information was, and is, the Airline Tariff Publishing Company (ATPCO). ATPCO, which had been a government agency prior to deregulation in 1978, is now owned by 19 major airlines. This company continuously receives fare information from more than 500 airlines and redistributes it to subscribers. Among the subscribers are the airlines themselves; the major Global Distribution Systems (GDS), namely, Sabre, which owns Travelocity; Amadeus; Worldspan; and Travelport/Galileo, which owns Orbitz; as well as other smaller reservation systems, including FareCompare.
Focusing only on cities that Hotels.com served, Seaney, Wallace, and a small team of Java developers spent 14 months creating and deploying a Java technology-based system that met Hotels.com's prescribed five-second limit and performed well online for three months. Then, suddenly, Expedia bought Hotels.com and, to Seaney's dismay, scrapped his system in favor of the larger company's own.
"Graeme and I mulled it over a bit," explains Seaney, "and we said, 'You know what? Only a handful of companies can do this, and we are one of them. Let's create a new product from scratch and build a company around it.' "
Their initial plan was to sell business-to-business products, and Seany and Wallace successfully implemented it in 2004, when chief operating officer (COO) Neil Bainton joined FareCompare. They licensed their new software technology to ATPCO, which was then a haven for IBM mainframe developers using the COBOL language and databases, an environment far from optimal for systemic functioning.
FareCompare sold and licensed products that included competitive intelligence software that enabled airlines to keep track of each other's pricing, auditing software to assure that consumers properly implemented airfare transactions, core pricing-engine functions, RSS airfare-pricing feeds, and mining deals in real time for web-based companies that sold directly to consumers. In addition, FareCompare consulted widely in the travel industry.
However, over time, Seaney and Wallace grew increasingly frustrated. From their perspective, the web-based companies that licensed their data feeds had technical difficulties in integrating the information and presenting it to end users. So in early 2006, FareCompare decided to create its own consumer web site to focus on helping consumers find the cheapest fares. Two and a half sleep-deprived months later, in March 2006, the company launched FareCompare.com, whose traffic has been growing steadily by 30 to 50 percent a month.
Shopping for an airline ticket has changed radically since 2001 as airlines reduced the viability of travel agents by cutting their commissions. Online airfare tools, first created in 2001, now enable power shopping: At sites such as Expedia, Orbitz, and Travelocity, consumers can enter travel dates and cities of departure and return to compare prices. But since the advent of power shopping, consumers have been on their own, acting as their own travel agents and using minimal outdated technology, with no one to give them the requisite background or resources they need to make the best buying decisions.
The main service of FareCompare, which does not itself sell tickets, centers around a question consumers may never think to ask: "When is the best time to buy a ticket to travel on a given date?" In fact, knowing the optimal purchase date determines the likelihood of getting the cheapest fare. Combining this with real-time instant alerts, FareCompare provides consumers an advantage in locating discounted tickets.
"FareCompare.com uses the latest in Java technology, working with our advanced software, to deliver discount airline ticket information to consumers first, two to six hours before anyone else -- including the airlines themselves!"
"FareCompare.com uses the latest in Java technology, working with our advanced software, to deliver discount airline ticket information to consumers first, two to six hours before anyone else -- including the airlines themselves!" exclaims Seaney. "This insider information makes it possible for a FareCompare.com user to be the first in line for these limited-in-quantity bargain price air tickets. We address key questions: Is the price cheaper days before or after? Should I buy now or later? Is the price going to go up or down if I wait to buy? We make it easier for you to be your own travel agent."
Most consumers visit three to six sites before purchasing a ticket, mainly because their queries retrieve different prices from different systems, and consumers are trying to get the best deal. A query from Expedia brings a different price than one from Travelocity, which in turn differs from Orbitz. Most consumers are unaware that, because airfares are continuously sent to ATPCO throughout the day and later distributed eight times a day, a consumer can query the same site within minutes and find the fare has gone up or down $100.
The airlines have spent millions of dollars creating yield-management software, whose single purpose is to maximize the price of each seat in an effort to increase revenues. Thus, a user who queries airlines five times in the course of a day may find that they offer five different prices for the same flight. The airlines' software analyzes historical trends, number of seats sold, competitors' prices, market share, estimated number of last-minute business travelers, flight-schedule convenience, and dozens of other variables to generate the current price for a particular seat. Airlines walk a fine line: If the airfare price is too high, travelers stay home; if the price is too low, airlines don't make enough money.
"Many super-low prices are only available part of the day," explains Seaney. "Airlines use quick-hit short-term prices to test their pricing strategy, to indicate displeasure with another airline that's 'tweaking' them, or simply to correct mistakes." Although the major sites offer a system by which consumers can enter a price they are willing to pay for a flight, leave their email address, and wait for an alert, many consumers don't know what price to enter and don't realize that the sites send alerts only once a day, so they miss out on many of the cheapest airfares. Most consumers know that, as a rule of thumb, the further ahead they book, the cheaper the flight is likely to be. To get the best deals, they need to know a lot more.
"The first thing consumers need to know in shopping for flights is what a good price is. We boil down hundreds of variables and multiple years of historical information into a simple graphically displayed star-rating system that allows consumers to quickly see how current prices stack up."
"The first thing consumers need to know in shopping for flights is what a good price is," remarks Seaney. "We boil down hundreds of variables and multiple years of historical information into a simple graphically displayed star-rating system that allows consumers to quickly see how current prices stack up. We display the history so you know what a good price is. And there are literally tens of thousands of price changes every day, so we monitor those changes in real time. The other sites take anywhere from two to four hours to process this data -- for international flights, it can take as long eight hours. Our system processes it in under five minutes. We send out alerts several hours before they show up on other sites."
FareCompare has two basic styles of distributing information to consumers. "We'll tell you when the price goes down; or if you have a preferred departure, airport, or nearby airports, we'll inform you of all the worldwide-departure price decreases from your designated airports," says Seaney.
FareCompare charges no fees for its services. Instead, it relies on advertising, consulting, business-to-business licensing of software, and fee sharing when consumers book with one of their partners to generate revenues. FareCompare primarily uses email alerts and RSS subscriptions to distribute real-time updates to customers.
So when is the best time to purchase your ticket? "In reality," says Seaney, "the best time to buy depends on two things: the cities you are traveling to and from, and your tolerance for risk. Purchasing travel is very similar to the stock market, where there are normally only a few perfect times a year to buy."
Seaney is adamant about the uniqueness and superiority of his site: "Absolutely no other site offers this sort of real-time alerting service." And how is FareCompare able to process and distribute such massive amounts of information so quickly? "Because of our state-of-the-art Java technology-based system," says Seaney.
FareCompare, a company of only 15 employees, is not to be confused with metasearchers that scrape and aggregate information from other sites. All of FareCompare's information comes directly from the airlines by way of ATPCO.
FareCompare faces major technical challenges: The company must process 300,000 to 400,000 city pairs, each with anywhere from 30 to 100 airfares, every day of the year, eight times a day, while maintaining historical databases. To complicate matters, consumers can buy tickets up to 330 days before the departure date. The key to FareCompare's competitive advantage is speed, performance, and reliability -- downtimes are not acceptable. The company's system must rapidly extract airfare information, apply it to its data store, and inform users through automated email before the major travel sites post the information. The challenge is to detect and communicate deals as fast as possible. A new low-price alert for a flight from San Francisco to New York kicks in more than a dozen different processes that take the raw fare, merge rule information in real time, pass it through pricing functionality, and tailor a specific email that fits the consumer's profile.
FareCompare, which uses Linux on AMD's Opteron processors, has 15 to 20 boxes constantly running its systems, with an equal number running tests in its labs. The company's multitiered architecture includes a data, processing, and pricing tier, each of which consumes large quantities of memory and requires complex caching between each tier. Its systems range from 4 to 8 to 16 gigabytes in size, with a total of 20 gigabytes a day manipulated for international fares and 7 to 10 gigabytes a day manipulated for domestic fares. Information is processed into a MySQL 5.0 database of more than 100 gigabytes that documents the history of all transactions. Apache web servers process incoming HTTP requests through a Java servlet engine cluster using Resin from Caucho Software and JavaServer Pages technology. FareCompare relies on a JBoss application server for both domestic and international airfare pricing clusters. Hundreds of customized modules in the company's multitiered caching architecture provide information, ranging from the core data storage of fare and rule information to the presentation of this information as prices on a web page. FareCompare's fare-processing system alone has 50 to 60 modules, each of which ranges in implementation from 10 to more than 1000 classes in the case of the data specification.
"The difference between J2SE 5.0 and Java SE 6 was startling. In terms of crunching through airfares, we are talking an increase in speed of 25 to 30 per cent."
FareCompare functions 24/7, which presents a challenge in rolling out new software. The company's clustering capacities enable it to take a machine offline, upgrade it, and bring it back online. The company uses network clustering of application servers and servlet engines to replace software and hardware.
FareCompare has always scoured successive releases of the Java platform for enhancements that will facilitate performance. "When the early builds of Java SE 6 came out," recalls Wallace, "we jumped in to see if we could get any performance improvements, which are crucial to our business. To make a long story short, the difference between J2SE 5.0 and Java SE 6 was startling. In terms of crunching through airfares, we are talking an increase in speed of 25 to 30 percent." The company achieved these gains through major changes in the Java HotSpot virtual machine (VM) and garbage collection systems in the Java SE 6 platform.
Although FareCompare's production system, web site, and application servers currently run on J2SE 5.0, the company set up a version of Java SE 6 to process airfares as soon as its engineers were able to stabilize it. In early 2007, FareCompare will be moving to Java SE 6 for its web sites, servers, and back-end support.
FareCompare's developers are confronted with particularly demanding challenges. The airfare data they receive from ATPCO uses archaic formats that were used to write magnetic tape from the mid-to-late 1980s. To process the data, developers must grasp the meaning of particular bits and bytes within these data formats, a feat that entails comprehending some 150 different record types simply to ascertain the cheapest price of an airfare on a particular day. Some developers are stymied by the task.
"There's an enormous learning curve involved in both processing the data and understanding the meaning of the bits and bytes within the data formats," says Wallace. "Some developers can grasp it; some can't."
To make matters worse, the ATPCO data format descriptions were incomplete and the processing algorithms were left as an exercise to the implementer. The company's first two ATPCO-processing software versions provided less than stellar performance.
Wallace explains: "We engaged in a kind of reverse thinking, and thought, 'OK, they used to distribute their airfare data on magnetic tape. How would somebody process it if they were reading a magnetic tape?' We re-architected everything to apply our algorithms as if they were reading the data from a magnetic tape. After doing this, a lot of the complexity dropped out, and we witnessed a 5- to 10-fold increase in processing speed. Initially, then, to speed things up, we scrapped converting and storing the information in binary, instead placing the onus on downstream processing to optimally handle conversions."
FareCompare's development team built a code framework that's designed to be as agnostic as possible in relation to its deployment environment. This code is then used as the basis for developing modules that may be customer- or job-specific.
"We essentially create modules of code and glue them together, whether with Enterprise Java Beans, web services, RMI, or multicast data transmission," says Wallace. "The specification for a particular module is published by way of an interface. Before generics, we had no way to specify the type that a collection of objects used as a return type or a parameter type. The documentation had to specify this, and the developer had to pay close attention when using a particular module."
Then came generics. "When generics arrived, we completely retooled every module API, which solves a lot of problems by picking up disparities at compile time."
"With generics and annotations, we could not only specify the individual data type -- basically the parameters and return values -- but also specify the caching behavior."
Annotations, introduced in J2SE 5.0, enabled FareCompare to gain better control of caching behavior. Because of its enormous data volume, FareCompare must cache information near where it is used and reuse information as often as possible -- roughly 80 percent of the queries focus on 20 percent of all destinations. Use of annotations has enabled the company's developers to build a caching framework that marks up the APIs of their modules to determine caching behavior.
"With generics and annotations, we can not only specify the individual data type -- basically the parameters and return values -- but also specify the caching behavior," observes Wallace. "Rather than having to keep caching behavior separate from the code we were developing, we could build it into the specification of each module. It was a great step forward."
Memory management constitutes a major challenge for FareCompare. The development team spent months perfecting memory heap parameters so that applications running in 8 to 32 gigabytes could continuously process and run information at remote co-location facilities with little or no maintenance or downtime.
FareCompare does extensive analysis of the behavior of the garbage collectors in every release of Sun's Java Virtual Machine, * both major and minor. "I've spent a lot of valuable time with a Java HotSpot VM engineer learning how to tune the garbage collectors," says Wallace. "The knowledge we gained from doing this has allowed us to apply the same techniques whenever a new VM comes out. With Java SE 6, we thoroughly analyzed the various segments of the heap that the VM uses -- survivor space, tenured space, Eden space, and so on. We hand tune it to decrease the garbage collection times."
Each FareCompare module has an MBean interface in order to derive statistics about the internal workings of that module. "Because of the generic code we've written, such as our framework modules, we have considerable control at runtime over how we want the code to behave, for example, how frequently we flush the cache and its optimal size," explains Wallace. "We acquire stats on the most popular destinations, departure points, and city pairs to further tune the cache preloading and eviction."
The MBean interfaces also enable FareCompare to monitor and debug its systems. "We can write a set of methods, declare them on an MBean for a particular module, fire up the application, and examine and change the behavior of the module while the system is running," states Wallace.
When J2SE 5.0 was released with Java Monitoring and Management Console (JConsole), FareCompare immediately made use of the new features in its applications. This was far easier than using the JBoss application servers' web-based MBean console.
"Just having an application-based JConsole enabled us to make connections to various machines running side by side with JConsole and monitor them that way," says Wallace. "We then used JConsole's built-in MBeans to either extract statistics or make operation calls on our running applications. The JConsole in J2SE 5.0 was useful, but the JConsole in Java SE 6 is much better, with its new plug-in architecture. For example, plug-in JTop is a huge improvement and the new HotSpot diagnostic MBean, which allows you to dump memory, stack traces, and configure VM options on the fly, has proved quite handy."
FareCompare performs data analysis on the usage of each of ATPCO's 150 record types to determine which fields need to be decoded for each processing step. "It's essentially packed ASCII code. We found that it takes a considerable amount of time to decode every field of every record," says Wallace. "The results of our data analysis led us to write some custom Java serialization code so that at each stage in the processing pipeline, we only have to unpack the fields that are needed at that point -- it's basically a chaining mechanism."
"As the objects are transferred through the pipeline from machine to machine, we decode fields based solely on the requirements of that processing step. We can transfer either the packed ASCII record itself or only decode a small portion of the fields that we need at a given point. When data gets closer to the web site and requires more fields, we can change the type of object as it's deserialized to unpack more of the fields, once and once only. We can choose the amount of time spent on decoding the record format depending on the application. Our custom serializers and deserializers are based on how many fields we are unpacking."
FareCompare exists in an extremely competitive industry in which speed and performance are crucial to survival. Because ATPCO's airfare database grows yearly by 10 percent, FareCompare is continually on the lookout for ways to speed up its system. As he muses about future innovations, Wallace remarks, "We look forward to the production release of Java SE 6 and its improvements in raw processing speed, I/O, and file system access."
* The terms "Java Virtual Machine" and "JVM" mean a Virtual Machine for the Java platform.