Oracle9iAS Web Cache

A Technical White Paper
June 2001.

<下載pdf>

EXECUTIVE SUMMARY
INTRODUCTION
The Dynamic Content Dilemma
OVERVIEW OF CACHING SOLUTIONS
Browser Caching
Proxy Caching
Content Delivery Network Services
Server Acceleration
INTRODUCING ORACLE9iAS WEB CACHE:
NEXT-GENERATION WEB CACHING FOR E-BUSINESS
FEATURES OF ORACLE9iAS WEB CACHING FOR E-BUSINESS
Full-Page Static and Dynamic Content Caching
Multiple Versions of the Same URL
Session-Encoded URLs
Simple Personalization
Partial-Page Caching and Personalized Page Assembly using ESIThe
ESI Markup Language

The ESI Development Model
ESI for Java (JESI)
Cache Invalidation
Expiration Policies
XML/HTTP Invalidation Messages
Post-invalidation Options
Performance Assurance and Surge Protection
Web Server Load Balancing, Failover and Binding
Automatic Content Compression
Management
DEPLOYING ORACLE9iAS WEB CACHE
Co-located versus Dedicated Deployments
Deploying Oracle9iAS Web Cache in a Distributed Network
Oracle9iAS Web Cache with Content Delivery Network Services
Oracle9iAS Web Cache with Oracle9iAS Database Cache
AVAILABILITY AND COMPATIBILITY
SUMMARY AND FURTHER READING
APPENDIX A: SAMPLE ESI CODE

 

EXECUTIVE SUMMARY
Oracle9i Application Server (Oracle9iAS) Web Cache is the industry's first caching solution designed from the ground up for e-business. With its unique combination of server acceleration and server load balancing, Oracle9iAS Web Cache tears down the performance barriers that plague today's dynamic e-business Web sites. Unlike legacy cache servers which only handle static data, Oracle9iAS Web Cache accelerates the delivery of both static and dynamically generated Web content, thereby improving response times for feature-rich pages.As the first and only caching solution on the market to support Edge Side Includes (ESI) for performing page assembly in edge servers, Oracle9iAS Web Cache leads the industry with its ability to deliver rich, personalized content from both the edge of the data center and the edge of the Internet. Deployed before a farm of application Web servers and/or globally at the network edge, Oracle9iAS Web Cache also provides load balancing, failover and patent-pending surge protection features for Web servers. Combined, these features ensure blazing site
performance and rock-solid up-time while reducing the cost of doing business online. With Oracle9iAS Web Cache, e-businesses can now serve compelling content faster, to more customers, using fewer computing resources than ever before.

Oracle delivers this innovative Web caching technology as part of the Oracle9i Application Server. Now the industry's most comprehensive application server platform - including state-of-the-art services such as Oracle HTTP Server powered by Apache, Oracle Container for Java, advanced developer kits, portal and wireless technologies, collaboration, application integration, business intelligence, security, and a unified management framework - is also the fastest application server on the market.

With Oracle9iAS Web Cache as part of the e-business blueprint, the cache becomes the delivery vehicle for content, leaving the Web servers and databases free to process updates and generate new content. By separating a Web site's content generation mechanism from its content delivery mechanism, Oracle9iAS Web Cache offers e-businesses an inexpensive means of achieving scalability and.Oracle9iAS Web Cache Technical White Paper Page 4 Copyright c 2000-2001 Oracle Corporation, All Rights Reserved performance targets, without sacrificing compelling content. Oracle9iAS Web Cache helps reduce costs by enabling e-businesses to leverage existing investments in hardware and software in order to meet increased customer demand. Most importantly, this next-generation e-business infrastructure is available as part of a single-vendor platform: the Oracle9i Application Server.

INTRODUCTION
The e-business model creates new performance requirements for Web sites. To carry out electronic business successfully, Web sites must protect against poor response times and system outages caused by peak loads. Internet users expect Web sites to be fast. Any delay in delivering content can cause potential customers to move on to the nearest competitor. A 1999 Zona Research report entitled The Need for Speed highlights the importance of adhering to what has become known as the 'eight second rule': if consumers cannot download a Web page within eight seconds, they may jump to a competitor's site or take their business off the Web entirely. With regard to the economic impact of slow Web sites, Zona concludes that "perhaps as much as $4.35 billion of ecommerce salesin the U.S. may be lost each year due to unacceptable download times."

While fast response times are crucial for revenue generation, e-businesses must also control costs if they want to become (or remain) profitable. The days when hardware companies gave free equipment to dotcom startups are long gone. In today's profit-driven economy, the ultimate challenge is to make Web sites perform and scale while at the same time lowering the infrastructure costs required to meet capacity targets. Reducing the "average cost per request" - in terms of hardware, software, manpower, bandwidth, and other services that are required to deliver a Web page - is something that e-businesses now strive for on a daily basis. But capacity planning is not a simple task. Even very large e-business infrastructures sometimes fail to produce the scalability, availability and response times demanded by a growing customer base, as illustrated by a number of high-profile Web site outages over the past few years.

The Dynamic Content Dilemma
Several solutions exist today to help improve performance on the Web, but manyof these are either inadequate or cost prohibitive or both.

One option is to design Web pages using only static content. In terms of computation and resource utilization, static content is easy to generate and deliver, and most static Web sites will perform adequately under heavy load. One of the problems with this approach is that without a dynamic, database-driven infrastructure, content management becomes difficult. Every time an update is made, the static Web site has to be redesigned.

While static content may have been sufficient for first-generation Web design, today's e-businesses must offer customers a more compelling user experience. E-business is anything but static. Companies must exchange data in real-time with other companies, and customer retention demands an interactive, one-to-one relationship with consumers. For these reasons and more, database-driven, dynamically generated content is at the heart of today's most successful Web-based business models, such as e-commerce catalogs, auctions, exchanges, online brokerages, Internet and intranet portals, CRM applications, and many others.

Despite its prominent role in today's state-of-the-art Web site architectures, dynamic content creation poses significant challenges for e-business managers who struggle to control costs without sacrificing performance. Creating dynamic content involves several steps which inevitably utilize a large amount of computing power and which, under load, can lead to performance bottlenecks. The typical steps are summarized in Figure 2. In order for a Web browser to request content from a Web server, the user's client machine must first connect to the Web server. Once connected, the browser's HTTP request has to be parsed by the Web server. The HTTP request may contain parameters and header information that must be passed to a "presentation" mechanism for appropriate content retrieval and formatting. If the requested content requires formatting by, say, a servlet or JSP, then the Web server must connect to a runtime environment - for example, Apache Jserv - which may be running on a separate machine. Once invoked, the servlet may query a database, requiring yet another connection, since the database is typically running on a dedicated machine of its own. The formatted content must be passed back to the Web server which finally delivers the dynamically generated page to the browser. The browser may then initiate several subsequent HTTP requests to retrieve the embedded (static) page elements.

Figure 2: Dynamic Web Page Generation

Considering the runtime, disk I/O and network operations just described, one can begin to understand how the computational overhead associated with building pages "on the fly" for thousands of concurrent users can result in increasing delays and failures in data delivery. Many medium- and high-volume Web sites try to counter this problem by adding more application Web servers to their existing architectures. In order to sustain performance under ever-mounting loads, a successful e-business may need to multiply its investment in hardware and software by a factor of ten or more on a yearly basis. Capital outlay of such inordinate magnitude is not unheard of, and many online businesses have disproportionately large datacenter infrastructures for this reason.

OVERVIEW OF CACHING SOLUTIONS
Caching is one of the key technologies that promises to alleviate the computational and economic burdens faced by today's overstrained e-business infrastructures. Nearly all applications benefit from having Web content cached on nodes between the consumers searching for content and the content source itself, known as the origin server. The demand for solutions to improve Web site performance can be measured by the sheer number of Web caching products and services on the market today. Nevertheless, while there exist numerous hardware and software caching solutions for caching static content, most are not well suited for the kind of dynamic, personalized content required by the e-business model. Some of these partial solutions are described in the following paragraphs.

Browser Caching
Caching is a feature supported by nearly all traditional Web browsers in use on PCs today. Most browsers can store static objects, such as graphics that a user has accessed on the Web, to a directory on the user's hard drive. The browser is configured to allocate a certain amount of hard drive space for this purpose. Browser caching can speed up the rendering of pages that contain cached objects.

As shown in Figure 3, when a URL is requested, the browser will first look in its cache. Depending on the browser's configuration, if it finds the object, the browser will load it from the cache rather than connecting to the Web server to get a new one. If the object is not available in the cache, then the browser retrieves it from the origin Web server and saves it to its local cache for future requests.

Figure 3:Browser Caching

This approach may help speed up the delivery of some static page elements, such as graphics, but does little to offload the computing power required to construct dynamic pages on the origin Web servers. Another reason browser caching is not very effective is that content providers tend to mark even their statically generated content with special HTTP headers - such as a "Pragma: no-cache" header or an expiration in the past - that render the content not cacheable. Content providers do this because they want to maintain control over their content, which is especially important when that content is changing frequently. It is impossible to "retrieve" content objects from Web browsers once those objects have been delivered, so content providers are careful about which objects they allow the browser to cache.

 

This approach may help speed up the delivery of some static page elements, such as graphics, but does little to offload the computing power required to construct dynamic pages on the origin Web servers. Another reason browser caching is not very effective is that content providers tend to mark even their statically generated content with special HTTP headers - such as a "Pragma: no-cache" header or an expiration in the past - that render the content not cacheable. Content providers do this because they want to maintain control over their content, which is especially important when that content is changing frequently. It is impossible to "retrieve" content objects from Web browsers once those objects have been delivered, so content providers are careful about which objects they allow the browser to cache.

Proxy Caching
In contrast to browser caching, where storage occurs on client machines, proxy caching is a server-based solution. Proxy cache servers are deployed between a large number of browser clients -- such as dial-up ISP users or users on a corporate intranet -- and the public Internet. When a user tries to access a URL, the client browser sends an HTTP request to the proxy. The proxy checks its In contrast to browser caching, where storage occurs on client machines, proxy caching is a server-based solution. Proxy cache servers are deployed between a large number of browser clients -- such as dial-up ISP users or users on a corporate intranet -- and the public Internet. When a user tries to access a URL, the client browser sends an HTTP request to the proxy. The proxy checks its local cache for the requested object and, if available, sends the object to the client immediately. If the object is not available or if it has expired, the proxy will request it from the origin Web server on the Internet. Once retrieved from the origin server, the object will then be stored in the proxy's local cache repository making it available for future requests. The proxy cache works as both a client and a server. As a server, it receives requests from intranet or dial-up clients, and as a client it sends requests to the origin Web servers.

Web objects may become stale as they sit in proxy caches. So how does a proxy cache know if an object in its cache repository is still the same object being served by the origin server? The basic HTTP protocol (HTTP 1.0) provides two mechanisms for cache consistency: the Expires response header and the If-Modified- Since request header. (Note that HTTP 1.1 includes a number of new Cache-Control response headers, most of which are rarely used in
practice.)

  • The Expires Response Header
    The simplest mechanism that proxy caches use to determine the freshness of a cached object is to compare the object's Expires response header to the current time. (Note that HTTP requires Web servers and proxies to use Greenwich Mean Time, or GMT.) This mechanism is sometimes referred to as "time-to-live". Generally, proxy caches serve objects that have not yet expired, without disturbing the Web server where the objects
    originated.

    There are two main problems with this mechanism. First of all, many origin Web servers are owned by commercial outfits that are valued by the number of hits they receive. If the cache delivers the cached object without notifying the origin server, the user statistics collected on the origin server will be misleadingly low. For this reason, many content publishers either choose not to configure their Web servers to set Expires headers, or they simply elect to serve all content with expirations in the past. The second and more daunting problem is that content publishers rarely know a priori what expiration value to assign to a particular content object. Setting the expiration too soon diminishes the value of caching and requires more frequent analysis for resetting expirations. Setting the expiration too far in the future prevents content publishers from recalling cached objects when content on the origin server has changed.

    One safe application of expiration dates far in the future is the expiration on static objects which are referenced only by other objects. If the content creator or publisher wishes to update the appearance of the referenced object, he only needs to change the references to point to the newer instance of the static object with a URI distinct from the stale instance.
    In practice, very few objects have expiration times. Even some obvious candidates for caching -- such as the up and down arrows on the New York Times home page used to indicate if the stock market is higher or lower than the previous day's close -- do not have expirations. Understandably,
    content publishers find it troublesome to manage the expiration policies of thousands of objects comprising a robust Web site.
  • The If-Modified-Since Request Header
    The second mechanism that proxy caches use to manage cache consistency addresses both problems associated with expirations, but carries a higher performance cost. Proxy caches maintain copies of objects lacking expiration dates, but only serve them to browser clients after checking with the origin server if a newer version is available. If a newer version of theobject exists, the cache server will request the new object from the origin server and freshen its cache as it servers the new object to the client. If the object has not changed, the cache server delivers the cached object.

In HTTP 1.0, the request the cache server issues to the origin server is an If-Modified-Since header request, where the origin server returns only the header of the requested object. Included in the response header is the latest modification time of the object. This is compared with the time at which the object was originally cached in order to determine the
freshness of the object.

As depicted in Figure 4, proxy caching is a common feature of corporate firewalls, which are used to protect the secure intranet against attacks or intrusions from the insecure Internet. In recent years, a number of single-purpose proxy cache servers have appeared on the market, mainly in the form of appliances. ISPs have been the primary market for single-purpose proxy cache servers, although corporations are beginning to adopt them, too, as a way to offload the firewall gateway.

    Figure 4: Proxy Caching

Although proxy caching may in some cases reduce response times for Web browsers, the primary purpose of proxy caching is to reduce bandwidth expenditures for corporations and ISPs. The more requests that can be served locally out of the proxy, the fewer requests that need to be transferred over more expensive long-haul networks. Of course, proxy caching suffers the same shortcomings of browser caching in that it fails to address the cost versus performance tradeoff faced by today's leading dynamic Web sites.

Content Delivery Network Services
The advent of Content Delivery Network (CDN) services represents anencouraging development in the use of caching technology. CDN services are designed to take advantage of the geographic locations of end users. The first generation of these services focused on static Web page graphics and streaming media delivery, using a network of distributed cache servers deployed in data centers all over the world to serve frequently accessed content from the "edge" of the Internet. Part of the thinking behind this idea is that by off-loading the static and streaming components of a Web site's traffic, the application servers and database servers have more resources available for dynamic page generation and transaction processing. Static content is not likely to change frequently and will most likely be requested by other users in the same geographic area. In theory, moving content closer to the end users also shortens response times for static content by avoiding "hot spots" or congestion at network access points on the.

Internet. By caching frequently accessed content closer to end users, the number of router hops is reduced and data will reach its destination more quickly.

In order to understand how first-generation CDN services work, a closer examination of the basic architecture is required. Embedded elements on a Web page, such as corporate logos and other graphics, can account for a large percentage of a page's content. These static elements are cached on servers managed by the CDN service provider. The cache servers, in turn, are hosted by ISP partners in key locations around the globe. The content publisher can control what pieces of content to "outsource" to the CDN provider by replacing existing HREF tags in the content owner's HTML with tags that point to the CDN provider's domain. The content is then sent to the CDN provider for deployment.

Figure 5: First-Generation Content Delivery Network Service

As depicted in Figure 5, a user's HTTP request will first go to the origin Web server (located in the content publisher's data center) which will return an HTML page with references for graphics and other objects pointing to the content delivery network. Redirected in this manner, the browser client now requests the "outsourced" content from the CDN provider, which chooses the closest and least-loaded cache server to respond to the requests. Additionally, some CDN services have developed sophisticated algorithms that generate maps of current Internet traffic conditions in split-second time. CDN services use these network maps to avoid sending content over congested segments of the Internet.

Next-generation CDN services, such as Akamai EdgeSuite, now offer the ability to deliver dynamically generated content from the edge. To take advantage of this new level of service, content providers must CName (create a DNS alias) their Web site to a hostname managed by the CDN service provider. Browser requests for the dynamically generated HTML, as well as the embedded static content, now flow entirely through the CDN's managed network of edge servers. New Edge Side Includes (ESI) functionality enables these next-generation services to aggregate portions of dynamic Web pages and reassemble them on the fly for individual users. The result of a joint development effort between Oracle and Akamai, but proposed as an open standard, ESI is a simple markup language that application developers use to identify content fragments for dynamic caching and page assembly in edge servers. ESI and its Java-based derivative, JESI, are discussed in greater depth later in this paper.

Efficient content management is still one of the greatest challenges that CDN services must overcome. In many cases, the content management tools used by these services cannot handle the volume and frequency of content updates demanded by today's leading dynamic Web sites. Thus, while the growth of CDN services is a promising evolution in the way that content is delivered on the Internet, these services do not obviate the need for robust application server infrastructure at the content provider's central data center and within the enterprise intranet.

 

Server Acceleration
Cache servers can also be deployed in front of a Web server or cluster of Web servers. This type of caching solution is known as server acceleration. Unlike a proxy server which caches content from an infinite number of sources, a server accelerator caches content for one or a handful of origin Web sites. Illustrated in Figure 6, a server accelerator intercepts all requests to the Web site and either returns the requested objects if they are present in the cache or forwards the requests to the origin Web server, which then generates the response. After a "cache miss", the server accelerator caches any cacheable objects returned by the origin Web server(s) and forwards the response back to the browser. As the server accelerator's cache becomes populated, it is able to serve more of the requested content itself. This, in turn, frees up processing resources on the Web server, application server and database tiers.

Figure 6: Server Acceleration

If a server accelerator were able to cache dynamic content, it would significantly reduce the processing overhead that currently plagues origin Web servers and databases. To be effective, the cache must be optimized to deliver dynamic, personalized, frequently changing data, as well as be capable of handling connections from thousands of concurrent users at high sustained rates of throughput. In addition to improving the top line for e-businesses - in terms of
faster response times to customers - the cache would also have to improve the bottom line by reducing infrastructure costs. Finally, the cache would need to support both local and global deployments, as well as complement and integrate with a number of popular CDN service offerings. Fortunately, this solution is available today - from Oracle.

INTRODUCING ORACLE9iAS WEB CACHE:
NEXT-GENERATION WEB CACHING FOR E-BUSINESS
Today's e-businesses possess a single infrastructure for generating and delivering content. Because each Web application server used for content creation is also responsible for content delivery, the capacity per server can vary anywhere from ten to a few hundred requests per second, depending on the size of the server and the application logic required to construct each page. Today's Web sites often need to sustain throughput rates on the order of several thousand requests per second. As a result, many e-businesses have been forced to purchase tens or even hundreds of servers in order to scale this infrastructure. (Sites that have between 100 and 1000 Web servers are not uncommon).

Figure 7: Single Infrastructure for Content Delivery and Content Generation

Enter Oracle9iAS Web Cache. Oracle9iAS Web Cache lightens the load on busy Web servers by storing frequently accessed pages in memory, eliminating the need to repeatedly process requests for those pages on mid-tier servers and databases. With Oracle9iAS Web Cache as part of an e-business' infrastructure, the cache becomes the delivery vehicle for content, leaving the Web servers and databases free to process updates and generate new content. By separating a Web site's content generation mechanism from its content delivery mechanism, Oracle9iAS Web Cache offers e-businesses an inexpensive means of achieving scalability and performance targets without sacrificing compelling content. In addition, Oracle9iAS Web Cache enables e-businesses to meet increased customer demand by leveraging existing investments in hardware and software

Figure 8: Separating Content Delivery from Content Generation

Oracle9iAS Web Cache is a state-of-the-art server acceleration solution, offering throughput rates of several thousand requests per second on commodity hardware. As the first and only caching solution designed from the ground up for e-business, Oracle9iAS Web Cache offers intelligent caching, page assembly and compression features which distinguish it from any other Web caching solution on the market. Unlike legacy proxy servers which cache only static objects, Oracle9iAS Web Cache accelerates the delivery of both static and dynamic Web content.

Caching dynamically generated content requires a set of features specifically designed for this purpose. One such requirement is an efficient means of maintaining consistency between the content in the cache and the content in the origin data repositories. Legacy cache products force content publishers to rely on expensive and complex content delivery tools to propagate new content to their caches. Further, these content delivery tools cannot handle the volume of content updates demanded by today's leading dynamic Web sites. Oracle9iAS Web Cache's patent-pending invalidation and performance assurance mechanisms allow it to maintain consistency with origin data sources, even under heavy loads or when content is changing frequently. Using a simple combination of expiration policies and invalidation messages, a content administrator or application developer can refresh cached content as frequently as the original content changes, thereby ensuring site accuracy and rapid response times. Because of its simplicity, the cache consistency model introduced by Oracle is easier to use, more flexible and less expensive than that of any other solution on the market.

Caching dynamically generated content also requires a new level of cache intelligence. Oracle9iAS Web Cache understands the contents of HTTP headers - including cookies -- and is capable of making caching and routing decisions based on administrator or application-defined cacheability rules. Often referred to as "content awareness", this advanced functionality makes it possible for administrators to cache different content for different categories of visitors, such as the ability to show full prices to new customers and discounted prices to returning customers. And with the sophisticated page assembly and personalization features of the industry's first and fastest ESI processor, even Web sites containing personalized greetings, session-encoded URLs and non-cacheable page fragments can take advantage of Oracle9iAS Web Cache. Combined, these features enable Oracle9iAS Web Cache to provide the highest cache hit rate of any caching solution available.

In addition to broadening the horizon for cacheable content, Oracle9iAS Web Cache offers powerful surge protection and load balancing features that help defend against Web server overload and site downtime. Oracle9iAS Web Cache monitors the load on each origin Web server that it accelerates, providing a crucial buffer between client browsers and the mid-tier servers that house the application. A patent-pending surge protection algorithm ensures that site performance remains at peak levels, even during traffic spikes or when content is changing frequently. Oracle9iAS Web Cache provides load balancing and failover services for Web server farms, so that cache misses are directed to the most available, highest performing origin Web server. When required, Oracle9iAS Web Cache also guarantees the integrity of e-commerce transactions, such as shopping cart purchases, by using cookies and session IDs for persistent, or "sticky", connections to Web servers. This unique combination of caching and load balancing helps ensure blazing site performance and rock-solid up-time, making Oracle9iAS Web Cache the solution of choice for accelerating e-business Web sites.

Although it is part of a complete and integrated application server product, Oracle9iAS Web Cache works seamlessly with third-party HTTP servers, application servers, databases, content management systems, CDN services, and load balancing devices. However, by designing Web caching into the Oracle9i Application Server, Oracle makes it easier for customers to develop, generate and deliver content from an integrated single-vendor platform: Oracle9i.

FEATURES OF ORACLE9iAS WEB CACHE
Oracle9iAS Web Cache is a powerful yet low-cost solution for accelerating online catalogs, auctions, Internet and intranet portals, business-to-business exchanges, and CRM applications. Key features include:

  • Full-Page Static and Dynamic Content Caching
  • Partial-Page Caching and Personalized Page Assembly using ESI
  • Cache Invalidation
  • Performance Assurance and Surge Protection
  • Web Server Load Balancing, Failover and Binding
  • Automatic Content Compression
  • Management

Full-Page Static and Dynamic Content Caching
When configuring Oracle9iAS Web Cache, administrators use cacheability rules to specify which content to cache and which content not to cache. Oracle9iAS Web Cache supports cacheability rules for static content, such as GIF and JPEG images, as well as content created using dynamic page generation technologies, such as Java Server Pages (JSP), Active Server Pages (ASP), PL/SQL Server Pages (PSP), Java Servlets, Common Gateway Interface (CGI), and many others. (Note that dynamically generated content was, until very recently, considered non-cacheable. This is in part due to the ephemeral nature of this content; another reason is that it is difficult to map HTML-formatted content to the relational tables and materialized views used to generate the content. As discussed later in this paper, the invalidation and capacity heuristics mechanisms supported by Oracle9iAS Web Cache help circumvent this difficulty quite nicely.) Examples of pages that are dynamically generated include:

  • product catalogs, where information on pricing and inventory might vary
  • from one moment to the next
  • auction views, which must be regenerated after each successful bid is
    processed

search results, which can change as catalog items are added and removed

Specifying cacheability rules within Oracle9iAS Web Cache is a declarative process. When an administrator assigns cacheability rules, he specifies the expression for either a specific URL or a collection of URLs and whether or not the content matching those URLs should be cached. (Please note that the regular expression syntax supported by Oracle9iAS Web Cache is based on POSIX 1003 extended regular expressions for URLs. Readers should refer to the Oracle9iAS
Web Cache Administration and Deployment Guide for information on proper cacheability rule syntax.)

The administrator must pay special attention to specifying the cacheability rules in sequential order. Higher ranking rules are processed first. For cacheable regular expressions that contain a subset of content that is not cacheable, the administrator will give the non-cacheable content a higher ranking than the cacheable content. For example, if an administrator wants all URLs containing ecaction=ecpassthru to be cached except for ecaction=ecpassthru2 he would enter the rules in the following order:

  • ecaction=ecpassthru2 (GET and GET with query string, Don't
    Cache)
  • ecaction=ecpassthru (GET and GET with query string, Cache)

If the order were reversed, all URLs containing ecaction=ecpassthru would be cached, including ecaction=ecpassthru2

Examples of content that administrators would typically declare non-cacheable include update transactions, shopping cart views, personal account views, and so forth. One of the easiest ways to set up cacheability rules in Oracle9iAS Web Cache is either to first specify the non-cacheable content, and then use a broad "catch-all" rule for the cacheable content, or to first specify the cacheable content followed by a non-cacheable catch-all. In practice, cacheable and non-cacheable rules may be interspersed.

In addition to the URL, administrators can specify optional selectors for more fine-grained cacheability rules. These additional selectors include the HTTP request method(GET, GET with query string, or POST) and, if POST is selected, the HTTP POST body of the documents. In the following rule list, Rule 2 caches documents of the URL that use the GET and GET with query string methods, and Rule 3 caches documents of the URL that use the POST method and a POST body matching action=search

  • ^/cec/cstage\?ecaction=ecpassthru2 (GET and GET with query
    string, Don't Cache)
  • ^/cec/cstage\?ecaction=ecpassthru.* (GET and GET with
    query string, Cache)
  • ^/cec/cstage\?ecaction=ecpassthru.* (POST,
    action=search, Cache)

If no cacheability rules are specified or if no rules match a particular request, then Oracle9iAS Web Cache behaves just as traditional proxy caches do; that is, it relies on HTTP header information to determine what is cacheable. Generally, proxy caches only cache static content.

It is not enough to say that a caching product can handle dynamically generated content. The cache must also be able to understand information contained in HTTP headers, including cookies. This level of "content awareness" is important because Web sites often use cookies and other HTTP headers to decide which version of a page to serve to a given user. Indeed, cookies are useful for a variety of purposes, including the ability for Web servers to return a page that displays a user's name in a "Welcome" greeting. Oracle9iAS Web Cache supports the following advanced rules for dynamically generated content:

  • Multiple-version documents for the same URL
  • Session-aware rules for pages containing session information
  • Simple personalization rules for pages containing basic welcome greetings,
    running shopping cart totals, and so forth


Multiple Versions of the Same URL
Some Web pages have multiple versions of the same URL depending on which user is accessing the page. For example, an e-commerce storefront may show discounted prices to returning customers and full prices to first-time visitors. Typically, a Web-based application determines which version of a page to return to the user based on cookie values and/or other HTTP request headers. Oracle9iAS Web Cache offers similar functionality.

Because HTTP is a stateless protocol, session information is typically passed back and forth between a Web browser and an application Web server using tokens called "cookies". Cookies are encoded strings of data generated by the Web. server and stored on the client. Cookies come in two basic flavors: persistent cookies and session cookies. Persistent cookies are usually stored in a text file on the client machine and may live indefinitely. Session cookies usually reside in the browser's memory space and are typically set to expire after a period of time determined by the Web server. Each time a user visits a particular Web site, the browser will send the cookies associated with that site as part of the HTTP request. The Web server's HTTP response might contain new cookies which get "set" on the client. For example, the following persistent cookies for the domain

"oracle.com" are contained in the file:
C:\Program Files\Netscape\Users\John.Doe\cookies.txt

  • oracle.com TRUE / FALSE 974299218 ORA_UCM_SRVC
  • oracle.com FALSE /cgi-bin/webiv/ FALSE 107164758

Oracle9iAS Web Cache is unique among caching products in that it is capable of using cookies as a means of disambiguation. For multi-version URLs based on cookies, cacheability rules may be created that specify the cookie name and whether to cache versions of the URL that do not use this cookie. Oracle9iAS Web Cache uses the existence and/or value of the cookie to serve the appropriate version of the URL to the appropriate users. For instance, a Web site might choose to display a page containing one set of prices to "walk-in" customers and another set of prices to "returning" customers, even though there is only one URL for this page. To achieve the same effect using Oracle9iAS Web Cache, an administrator would simply configure the cache to look for different account category cookies, such as

  • ec-400-id-acctcat=WALKIN
  • ec-400-id-acctcat=CUSTOMER

before deciding which version of the page to serve. Oracle9iAS Web Cache will store as many different versions of the page as there are different categories of customers.

Some applications use HTTP request headers to determine which version of a page to serve in response to requests for the same URL. HTTP request headers enable the Web browser to communicate additional information about the request and about itself. For purposes of disambiguation, the current version of Oracle9iAS Web Cache supports the following HTTP request headers:


Accept Specifies which media types are acceptable for the response
Accept-Charset Specifies which character sets are acceptable for the response
Accept-Encoding Specifies which content-encoding types are acceptable in the response
Accept-Language Specifies the set of languages that are preferred as a response
User-Agent Contains information about the Web browser that initiated the request

Table 1: HTTP Request Headers

Thus, for a given URL, Oracle9iAS Web Cache is capable of serving the German version of the page (should there be one) to a browser whose Accept-Language request header specifies a preference for German-language pages.

Session-Encoded URLs
In the world of e-business applications, one particularly important use of cookies is for maintaining session state. This is especially true for shopping carts and other e-commerce transactions. Cookies are also used to track users' clickstream habits as they browse through a Web site. Nevertheless, some users disable cookies on their browsers because they are fearful of what private information the cookies might reveal about their browsing habits. So instead of turning away these potential customers, many Web sites simply embed cookies as parameters in
URLs in order to service (and track) "cookie-phobic" visitors. This is typically done by inserting (or appending) a unique sequential number, called a session ID, into all the <A HREF=…> links in the site's HTML code.

Using the storefront example again, content returned to customer "Jane Doe" might be identified by the URL:

store.company.com/cec/cstage?ec_cookie=33436/

whereas the same content returned to customer "John Doe" might be identifiedby the URL:

store.company.com/cec/cstage?ec_cookie=33437


Embedding session IDs in every URL makes Web pages unique for each user. Such one-to-one content is considered non-cacheable by today's caching solutions - with the sole exception of Oracle9iAS Web Cache.

Oracle9iAS Web Cache provides a highly efficient, easy-to-use string substitution mechanism for inserting session IDs into URLs. Using the Oracle9iAS Web Cache Manager, an administrator simply defines a session type and associates it with a cacheable set of URLs. The first time one of these session-embedded URLs is requested by a user's browser, the cache will forward the request to the origin Web server, which will generate a response containing the session-embedded URL. Oracle9iAS Web Cache will send the page back to the browser and then insert the page into the cache. At the time of insertion, Oracle9iAS Web Cache will strip out the session information and leave a placeholder.

For subsequent requests from that user, Oracle9iAS Web Cache is able to substitute values for embedded session IDs based on the information contained in the request header. Similarly, when the next user requests the URL, his initial request will be forwarded to the origin Web server to generate a session ID, but all subsequent requests from that user can be satisfied directly out of the cache. Because content publishers want to keep track of user clickstream data, Oracle9iAS Web Cache logs each user's request, including the session information.

Due to its sophisticated design and careful implementation, the substitution mechanism just described carries practically no performance overhead. Oracle is simply extending the boundaries of what is considered cacheable.

Simple Personalization
Many Web sites support pages with personalized attributes, such as personalized greetings in the form of "Welcome <your name>", on an otherwise generic page. Oracle9iAS Web Cache uses a substitution mechanism similar to the one just described in order to cache such seemingly unique pages.

Administrators use special-purpose SGML comments called "Web Cache tags" to identify personalized attribute information within a page.

For example, the HTML for Company.com storefront might contain:

<HTML>

Welcome to the Company.com Store,
<!-- WEBCACHETAG="person_name">
John Doe
<!-- WEBCACHEEND-->

</HTML>

Oracle9iAS Web Cache parses the HTML and caches one "generic" version of the page, leaving a placeholder for the personalized string of characters located between the Web Cache tags. For subsequent requests, Oracle9iAS Web Cache substitutes values for personalized attributes based on information contained within a cookie or an embedded URL parameter. For example, the URL

store.company.com/cec/cstage?person_name=Jane+Doe

could be used to display a page containing the greeting "Welcome to the Company.com Store, Jane Doe". Again, the administrator uses the Oracle9iAS Web Cache Manager tool to build an association between a personalized attribute cookie or embedded URL parameter and a cacheable set of URLs.

Other personalized data, such as shopping cart totals, can be displayed on cacheable pages using this simple mechanism. Readers should refer to the product documentation for more information on the many ways in which this feature may be employed.

This in-cache personalization feature is similar to the session-encoded URL and multi-version URL features in that it enables Oracle9iAS Web Cache to use the same page for multiple users. Because only one page needs to be cached, only one application Web server request is required to initially populate the cache with the page. All subsequent requests for the page will result in a cache hit.

Many of the advanced cacheability features discussed above - string substitution, simple personalization, etc. - are also features of the new Edge Side Includes (ESI) specification. Nevertheless, these simple yet useful features require little or no application redesign. ESI-enabled applications, on the other hand, require more extensive application changes but offer higher levels of in-cache personalization and more fine-grained content assembly features.


Partial-Page Caching and Personalized Page Assembly using ESI
Full-page caching with simple substitution is sufficient for many of today's consumer-oriented Web sites. However, a number of e-business applications, as well as intranet and Internet portals, require a more fine-grained level of caching and content assembly. Pages containing data that is unique to a particular user, such as the user's bank account summary or sales quota, tend to be poor candidates for full-page caching due to the amount of cache resources consumed for each user. Pages containing data that is not particularly unique but which is arranged according to a given user's preferences are also less prone to be cacheable with a full-page caching system. Examples of the latter include portal pages with personalized stock lists, news, weather and banner ads. In either case, partial-page caching and dynamic page assembly mechanisms are required in order to accelerate the delivery of such highly personalized content.

In order to take advantage of partial-page caching and personalized page assembly, applications must be made "cache-aware". In turn, Web developers need an industry-standard markup language to identify more fine-grained page elements called content fragments that can be automatically assembled into complete, personalized Web pages for faster delivery.

To address this need, Oracle and Akamai have collaborated to define Edge Side Includes (ESI). ESI is a simple markup language that developers can use to identify content fragments for dynamic assembly at the network edge. ESI also specifies a content invalidation protocol for transparent content management across ESI-compliant solutions, such as application servers and content delivery networks. The ability to assemble dynamic pages from individual page fragments means that only non-cacheable or expired fragments need to be fetched from the origin Web servers, thereby lowering the need to retrieve complete pages and decreasing the load on the Web site's content generation infrastructure

Figure 9: ESI further separates content delivery from content generation for greater scalability and cost savings


With ESI,

  • e-businesses can now develop highly dynamic Web-based applications that are assembled at the edge of the data center (with Oracle9iAS Web Cache) and/or at the edge of the Internet (with ESI-compliant CDNs), for improved performance;
  • the aggregation and assembly of content in edge servers dramatically reduces the cost of infrastructure required to deliver fast, scalable and fault-tolerant applications.


The ESI Markup Language
ESI enables Web pages to be broken down into fragments of differing cacheability profiles. These fragments are maintained as separate elements in the application server's local cache and/or on the content delivery network. ESI page fragments are assembled into HTML pages when requested by end users. This means that much more dynamically generated content can be cached, then assembled and delivered from the edge when requested. Furthermore, page assembly can be conditional, based on information provided in HTTP request headers or end-user cookies.

The ESI markup language (summarized in Table 2) includes the following key
features:

  • Inclusion - ESI provides the ability to fetch and include files to comprise a Web page, with each file subject to its own configuration and control, its own specified time-to-live in cache, revalidation instructions, and so forth. Included documents can include ESI markup
    for further ESI processing. Currently, ESI supports up to three levels of recursion.

  • Conditional inclusion -- ESI supports conditional processing based on Boolean comparisons or environmental variables.
  • Environmental variables -- ESI supports the use of a subset of standard CGI environment variables such as cookie information. These variables can be used inside ESI statements or outside of ESI blocks.
  • Exception and error handling -- ESI allows developers to specify alternative pages and default behavior, such as serving a default HTML page in the event that an origin site or document is not available. Further, it provides an explicit exception-handling statement set. If a severe error is encountered while processing a document with ESI markup, the content returned to the end user can be specified in a "failure action" configuration option associated with the ESI document.
Tag Purpose
<esi:include> Include a separately cacheable fragment.
<esi:choose> Conditional execution - choose among several different alternatives based on, for example, cookie value or user agent.
<esi:try> Specify alternative processing when a request fails (e.g., the originserver is not accessible).
<esi:vars> Permit variable substitution (for environment variables).
<esi:remove> Specify alternative content to be stripped by ESI but displayed by the browser if ESI processing is not done.
<!--esi … --> Specify content to be processed by ESI but hidden from the browser.
<esi:inline> Include a separately cacheable fragment whose body is included inthe template.

Table 2: Summary of ESI Tags.

 

The ESI Development Model
The basic structure a content provider uses to create dynamic content in ESI is a template page containing HTML fragments (see Figure 10). The template consists of common elements such as a logo, navigation bars, and other "look and feel" elements of the page. The HTML fragments represent dynamic subsections of the page.

Figure 10: Example ESI template page containing ESI fragments and theirexpiration policies

The template is the file associated with the URL the end user requests. It is marked-up with ESI language that tells Oracle9iAS Web Cache or the content delivery network to fetch and include the HTML fragments. The fragments themselves are HTML/ESI marked-up files containing discrete text or other objects. (Sample ESI code for the Company.com template example is included in Appendix A.)

Each fragment is treated as its own separate object: each has its own cache and access profile which are set in HTTP headers or in the Oracle9iAS Web Cache configuration file. For example, content providers may want to cache the template for several days, but only cache a particular fragment (such as an advertisement or stock quote) for a matter of seconds or minutes. Other fragments (such as a user's bank account total) may be declared entirely non-cacheable.


Cached templates and fragments may be shared among multiple users. This means that for a large percentage of requests, the entire page can be assembled using shared components and delivered from the edge. ESI obviates the need for full-page updates when individual page fragments change. For example, when a single user requests the stock quote for General Electric Corp. (GE), and the fragment representing the GE stock quote has expired or changed, the revalidation of that fragment applies to all users' pages that reference GE.


ESI for Java (JESI)
To accelerate the adoption of ESI among the Java development community, Oracle and Akamai have also introduced Edge Side Includes for Java (JESI). JESI provides extensions to Java that make it easy to program JavaServer Pages (JSPs) using ESI. JSPs are server-side software modules that produce final user interface by linking dynamic content and static HTML through tags.


Tag Purpose
<jesi:include> Used in a "template" page to indicate to the ESI processor how the fragments are to be assembled (the tag generates the <esi:include> tag.
<jesi:control> Assign an attribute (e.g., expiration) to templates and fragments.
<jesi:template> Used to contain the entire content of a JSP container page within its body.
<jesi:fragment> Encapsulate individual content fragments within a JSP page.
<jesi:codeblock> Specify that a particular piece of code needs to be executed before any other fragment is executed (a database connection established, user id computed, etc.).
<jesi:invalidate> Explicitly remove and/or expire selected objects cached in an ESI processor.
<jesi:personalize> Insert personalized content into a page where the content is placed in cookies and inserted into the page by the ESI processor.

Table 3: Summary of JESI Tags.


JESI is a specification (summarized in Table 3) and custom JSP tag library that developers can use to automatically generate ESI code. For JSP developers, JESI represents an easy way to express the modularity of pages and the cacheability of those modules, without requiring developers to learn a new programming syntax.

The complete ESI and JESI specifications are available for review at http://www.edge-delivery.org/. Please refer to the Oracle9iAS Web Cache Administration and Deployment Guide and the OracleJSP Support for JavaServer Pages Developer's Guide and Reference for more information on building and deploying ESI-and JESI-enabled applications.


Cache Invalidation
Oracle9iAS Web Cache supports expiration and message-based invalidation mechanisms in order to keep its cache consistent with the content on the origin Web server(s) and database(s). Because of these mechanisms, Oracle9iAS Web Cache always knows which content is fresh and which content is stale. This is especially important for dynamically generated Web pages that change frequently. When a browser requests an object that has been invalidated, Oracle9iAS Web Cache retrieves a new version of the object from the origin Web server, provided that the origin Web server has the capacity to handle the request.

Administrators can invalidate content in one of two ways:

  • An expiration policy can be assigned to the cached content
  • An XML/HTTP invalidation message can be sent to the Oracle9iAS
    Web Cache host machine


Expiration Policies
When an object expires, Oracle9iAS Web Cache marks it invalid. There are three
ways to set expiration rules with the Oracle9iAS Web Cache:

Expire <time> aftercache entry Expiration is based on when the object is inserted into the cache.
Expire <time> afterdocument created Expiration is based on when the object was created. This option relies on the Last-Modified header generated by the origin Web server.
Expires as per HTTPExpires header This is the default option. Expiration is based on the Expires header generated by the origin Web server.

Table 4: Expiration Policies.


A Web site that displays weather forecasts and current climate conditions is an example of an application that would benefit from invalidation using the expiration policies. The Web pages relating to the climate conditions could be set to expire 30 minutes after the pages were created, thereby ensuring that customers never receive outdated information.


XML/HTTP Invalidation Messages
Expirations are used primarily in cases where content changes can be accurately predicted. For less predictable, more frequently changing content, a general message-based mechanism is needed for maintaining cache consistency. The invalidation message format and grammar must integrate readily with existing applications and content repositories, including databases, custom scripts and content management tools.

Oracle9iAS Web Cache invalidation messages are HTTP POST requests that carry an XML payload. (This invalidation message format is also part of the ESI specification which can be obtained at http://www.edge-delivery.org/.) The contents of the XML message body tells the cache which URLs to mark as invalid. As shown in Figure 11, invalidation messages can be sent using one of the following methods:

  • Manually, using either Oracle9iAS Web Cache Manager or Telnet
  • Automatically, using database triggers, scripts or applications

Figure 11: Message-Based Cache Invalidation


Manual Invalidation Using Telnet
Manual invalidation involves generating an HTTP POST message containing the host name of the Oracle9iAS Web Cache machine, the invalidation listening port number, authentication data, and the invalidation instructions.

Via telnet, for example, an administrator would send an invalidation message using the following procedure:

  • Connect to the Oracle9iAS Web Cache host machine at the invalidation
    listening port:

telnet web_cache_host invalidation_port

  • Once connected, specify an HTTP POST message header and authenticate
    the user "invalidator" using Base64 encoding with the following syntax:

POST /x-oracle-cache-invalidate http/1.0|1
Authorization: BASIC
<base64 encoding of invalidator:invalidator_password>
content-length:#bytes

  • Enter one carriage return
  • Use the following XML syntax to invalidate document(s) contained within
    an exact URL that includes the complete path and file name:
    <?xml version="1.0" ?>
    <!DOCTYPE INVALIDATION SYSTEM
    "internal:///WCSinvalidation.dtd">
    <INVALIDATION VERSION="WCS-1.0">
    <OBJECT>
    <BASICSELECTOR URI="URL"/>
    <ACTION REMOVALTTL="TTL"/>
    </OBJECT>
    </INVALIDATION>.
  • Use the following syntax to invalidate document(s) based on more advanced
    invalidation selectors:

<?xml version="1.0" ?>
<!DOCTYPE INVALIDATION SYSTEM
"internal:///WCSinvalidation.dtd">
<INVALIDATION VERSION="WCS-1.0">
<OBJECT>
<ADVANCEDSELECTOR URIPREFIX="prefix"
URIEXP="URL_expression">
METHOD="HTTP_request_method">
<COOKIE NAME="cookie_name" VALUE="value"/>
<HEADER NAME="HTTP_request_header" VALUE="value"/>
</ADVANCEDSELECTOR>

<ACTION REMOVALTTL="TTL"/>
</OBJECT>
</INVALIDATION>


Please refer to the Oracle9iAS Web Cache Administration and Deployment Guide for a complete explanation of the invalidation message syntax just described.

Manual Invalidation Using Oracle9iAS Web Cache Manager
Oracle9iAS Web Cache Manager provides an easy-to-use browser interface for invalidating cached objects. Under the covers, the message mechanics are much like the telnet example just described. The advantage of the browser approach is that the administrator is isolated from the intricacies of the HTTP and XML formats, and consequently, there is less chance for error. The administrator need only specify which objects to invalidate and how invalid those objects should be.


Automatic Invalidation Using Database Triggers
Database triggers are procedures that are stored in the database and activated ("fired") when an INSERT, UPDATE, or DELETE statement is issued against a table. A trigger stored in the database can include SQL and PL/SQL or Java statements to execute as a unit. A trigger can be set so that when a database table is updated, an HTTP invalidation message is sent to the Oracle9iAS Web Cache. (Note that any database which supports triggers and HTTP can be used to invalidate content stored in Oracle9iAS Web Cache.) In Oracle databaseenvironments, administrators may use the UTL_TCP package supplied with Oracle9iAS Web Cache. In the case of an online store, for example, a trigger can be set to send an invalidation message to the Oracle9iAS Web Cache whenever the price column in an item table is updated. This will ensure that cached pages containing item prices are always consistent with the origin database.

Automatic Invalidation Using Scripts
Many Web sites use scripts for uploading new content to databases and file systems. An online book retailer, for instance, might run a PERL script once per day in order to bulk load new book listings and price changes into its catalog database. The retailer would want the price changes and availability listings to be reflected in the item views and search results currently cached in Oracle9iAS Web Cache. To achieve this, the PERL script can be modified such that when the bulk loading operation has completed, the script will send an invalidation message to the cache invalidating all catalog views and search results. (Note that the invalidation message need not list every individual search page or item view that might be effected by the data change. The performance assurance feature of Oracle9iAS Web Cache enables administrators to use broad brush strokes when invalidating content, making it safe to invalidate all catalog and search content even if only a fraction of that content has changed. Performance assurance is discussed in the next section of this paper.)


Automatic Invalidation Using Applications
Invalidation messages can also originate from a Web site's underlying application logic or from the content management application used to design Web pages. With only moderate code changes, almost any application can automatically generate the XML/HTTP messages required for invalidating cached content. Today, Oracle9iAS Web Cache ships with sample C and Java code that enables developers to embed invalidation mechanisms directly into their applications. To further simplify the invalidation process for JSP developers, OracleJSP now ships with a JESI custom tag library, which supports automatic invalidation for JSPs through easy-to-use <JESI:invalidate> tags. Future releases of Oracle CRM and ERP applications, as well as a number of the Oracle development tools used to build applications, will feature transparent invalidation of content stored in Oracle9iAS Web Cache. And as the ESI and Web-based Distributed Authoring and Versioning standards (WebDAV) evolve, automatic cache invalidation will become a transparent part of the e-business content management infrastructure (Readers may refer to http://www.webdav.org/ for further information about WebDAV.)


Post-invalidation Options
Administrators have two options for specifying how Oracle9iAS Web Cache should process objects once they have expired or been invalidated.

Remove immediately This option will mark objects invalid and instructsOracle9iAS Web Cache never to serve them stale.
Refresh on demand as applicationWeb server capacity permits AND nolater than <time> after expiration

This option will mark object invalid and then refresh them based on origin Web server capacity. The maximum time that objects can reside in the cache is also specified.

Specifying a removal time in the future helps Oracle9iAS Web Cache decide which objects may be served stale under certain conditions. Oracle9iAS Web Cache will only serve a stale version of an object if the origin Web servers do not have capacity to generate new versions.

Table 5: Post-invalidation Options


Performance Assurance and Surge Protection
One could logically assume that widespread cache invalidation or expiration would negatively impact performance of the origin Web server(s), resulting in the generation of HTTP 500 Server Busy errors or even Web server overload. For this reason, Oracle9iAS Web Cache intelligently serves stale versions of invalid objects until the origin Web servers have the capacity to refresh them. The result is that overall Web site performance remains constant at the higher throughput levels sustainable by the cache, even with frequent content changes on the origin Web server(s) and database(s).

When faced with the choice of serving some stale content or no content at all, most Web site administrators opt for the former.

Oracle9iAS Web Cache uses a patent-pending performance assurance heuristic that determines which objects to refresh and which objects to serve stale, with minimal tradeoff between Web site performance and content consistency. Input for the heuristic algorithm is provided in part by the Oracle9iAS Web Cache administrator (at configuration time and when invalidating content), and in part by statistics gathered by Oracle9iAS Web Cache during normal operations Specifically, the propensity of Oracle9iAS Web Cache to serve a stale object is determined by a combination of the following key factors:

Validity of the object Oracle9iAS Web Cache calculates validity by comparing the current time relative to an object's expiration/invalidation time and the object's scheduled removal time. Prior to expiration/invalidation time, the object is considered perfectly valid. Between expiration/invalidation time and removal time, the object's validity level decreases linearly. During this interim state, objects with a higher validity level have a higher propensity to be served stale. When current time reaches removal time, the object is considered totally invalid and can no longer be served stale. Scheduled removal time is something that administrators can control. When expiring/invalidating content, administrators have the option to remove objects immediately, which may be necessary for sensitive objects that should never be served stale. Likewise, where some degree of inconsistency is tolerable, administrators can specify a removal time in the (near) future.
Popularity of the object Popularity is determined by the total number of requests for the object since insertion into the cache, with more emphasis on the most recent requests for the object. These statistics are gathered automatically by Oracle9iAS Web Cache.
Load on the origin Web server(s) The current load on the origin Web server(s) is determined by the number of open connections from Oracle9iAS Web Cache to the origin Web server(s), i.e., the total number of pending requests to the origin Web server(s).
Capacity of the origin Web server(s) Administrators configure Oracle9iAS Web Cache with the capacity of each origin Web server from which new/fresh content may be requested. Capacity is determined by the number of concurrent connections that each origin Web server can safely handle without crashing.

Table 6: Factors for Performance Assurance Heuristic

These factors are used to provide Oracle9iAS Web Cache with a logical queue of objects to retrieve from the origin Web server(s).

One of the indirect benefits of the heuristic just described is that it enables use of invalidation operations which are broad in scope, without negatively impacting performance. Fine-grained invalidation can be a painstaking process for an administrator - it is particularly difficult to map content stored in rows and columns in a database to content formatted in HTML on a Web page, especially when relational content is partitioned or resides in materialized views. For this reason, broad-brush invalidation is a welcome addition to an administrator's arsenal, provided that performance can be guaranteed. Only Oracle9iAS Web Cache makes this possible.

Peak loads and request patterns on Web sites are seldom predictable. It is difficult to foresee when requests for new content will generate more traffic than a site's Web servers have capacity to handle. Such traffic spikes or surges, as they are known, are even more troublesome when the requested content must be dynamically generated or when the content is changing frequently. Because Oracle9iAS Web Cache can sustain orders of magnitude greater throughput than the average Web server, it provides a layer of defense against such surges.

Surge protection and performance assurance are closely linked. To prevent an overload of requests on the origin Web server(s), Oracle9iAS Web Cache enables administrators to set a limit on the number of concurrent connections that the origin Web server(s) can handle. When the capacity limit is reached, subsequent requests are queued to wait up to a maximum amount of time. If the maximum wait time is exceeded, Oracle9iAS Web Cache rejects the request and serves an "apology" page to the Web browser that initiated the request.


Web Server Load Balancing, Failover and Binding
Most Web sites are powered by a farm of application Web servers running on clustered machines. Distributing the load across multiple servers provides better scalability and reliability and allows administrators to take machines offline for maintenance without impacting availability. The traffic management function is typically provided by a network load balancing device deployed at the front-end of the Web server farm.

Oracle9iAS Web Cache uniquely combines caching and load balancing in a single offering. Deployed before a Web server farm, Oracle9iAS Web Cache intercepts all HTTP requests sent to the Web site and responds with a cached page if there is a valid version in the cache. As depicted in Figure 12, all cache misses - whether cacheable or non-cacheable - are passed to the origin Web servers on the back-end. Oracle9iAS Web Cache distributes these requests according to the relative capacity of each Web server. Web server capacity is configured by theOracle9iAS Web Cache administrator

Figure 12: Web Server Load Balancing

Just as network load balancers do, Oracle9iAS Web Cache can determine when a Web server has failed and then automatically redistribute the load over the remaining servers. The failover feature is illustrated in Figure 13. Continuing with the load balancing example from Figure 12, an outage of www3.company.com results in 50 percent of the traffic going to www1.company.com and 50% going to www2.company.com. Oracle9iAS Web Cache periodically checks to see if the failed Web server has returned to a functional state. As soon as the failed server returns to operation, Oracle9iAS Web Cache will once again include it in the distribution mix.

Figure 13: Web Server Failure Detection.


Today's traffic management solutions must offer more than just load balancing and failover; they must also provide persistent connections to Web servers that maintain session state. Many Web-based applications store session state in the database, however there are cases when session state may be stored in the middle tier. In such cases, it becomes important to maintain affinity, or persistence, between a given Web browser and a particular Web server in the farm. Because HTTP is a stateless protocol, the Web server must include session data in the HTTP header or body it sends to the Web browser in such a way that the browser is forced to include the data with its next request. In practice, this data is transferred either with parameters embedded in URLs or with cookies. The load balancing device intercepts these requests, identifies the session parameters or cookies, and binds the request to the appropriate Web server.

Figure 14 shows how Oracle9iAS Web Cache supports Web sites that use session IDs or cookies to bind user sessions to a given application Web server in order to maintain state for a period of time.


Figure 14: Application Web Server Binding


Automatic Content Compression
Oracle9iAS Web Cache may be configured to automatically compress objects upon insertion into the cache. Content compression is useful because it can. significantly accelerate response times and reduce bandwidth expenditures. Most Web browsers are able to decompress and render objects that have been compressed using GZIP. Specifically, GZIP encoding is supported in the 4.x versions of Microsoft Internet Explorer and Netscape Navigator, now several years on the market. HTML, XML and other text file formats are ideal candidates for compression. (GIF and JPEG images are already compressed.) Using GZIP, a 20K HTML file compresses to about 4K.

Most Web servers on the market are capable of serving compressed files, but they generally offer no "automatic" compression features. Instead, administrators must manually compress static files using a compression utility and store the files in the file system where the Web server can access them. With Oracle9iAS Web Cache, compression is a simple "Yes/No" option that administrator's select when specifying a cacheability rule. Recall that Oracle9iAS Web Cache supports regular expression for cacheability rules, so compression is easy to apply to a range ofpages, avoiding the administrative hassle of compressing pages one by one. And unlike the typical Web server, Oracle9iAS Web Cache offers compression for pages that have been dynamically generated.

For cacheable content that an administrator elects to compress, Oracle9iAS Web Cache stores both compressed and uncompressed versions in the cache. (If a cacheable object retrieved from the origin Web server already contains a Content-Encoding response header, which is typically used to denote compression, Oracle9iAS Web Cache will not compress it.) Non-cacheable responses can also be compressed on-the-fly if the administrator elects this configuration option.

Browsers that send an Accept-Encoding request header containing either "gzip" or "*" will receive the compressed version of the content; browsers that do not send this header will receive the uncompressed version. For example, requests containing any of the following headers will receive compressed data:

  • Accept-Encoding: deflate, gzip
  • Accept-Encoding: *

Again, the benefits of compression are obvious: shorter response times for end users and reduced bandwidth costs for ISPs, corporate networks and content publishers. Oracle9iAS Web Cache adds value by automating the compression process and expanding the range of compressible pages to include those that have.

Management
Oracle9iAS Web Cache Manager is a browser-based console for administering the
cache. Managing Oracle9iAS Web Cache involves the following tasks:

  • Configuration
  • Invalidation
  • Monitoring
  • Logging

Key configuration steps include setting cacheability and load balancing rules, as described earlier in this paper.

Figure 15: Using Oracle9 AS Web Cache Manager to Set Cacheability Rules

Oracle9iAS Web Cache also provides important security features, namely:

  • Password authentication for administration and invalidation operations.
  • Restricting subnets from which administration and invalidation operations can originate
  • Timeout settings for inactive connections

 

Cache invalidation is an administrator-defined process. Invalidation can be performed manually or it can be automated as part of a Web site's overall content management process. When defining cacheability rules using Oracle9iAS Web Cache Manager, administrators have the option of associating expiration policies with these rules. Expiration policies offer a simple, automated mechanism for managing cached content with predictable validity timeframes. For content with validity windows that are difficult to determine a priori, Oracle9iAS Web Cache also provides a general ESI-compliant invalidation mechanism based on HTTP and XML. This flexible message-based mechanism provides ample room for automation and integrates easily with a Web site's existing content management infrastructure. Finally, Oracle9iAS Web Cache Manager includes a Cache Cleanup screen for manually invalidating cached content. (See Figure 16 below.) Invalidation is covered in greater detail in a preceding section of this paper.

Figure 16: Manual Invalidation with Oracle9iAS Web Cache Manager

Oracle9iAS Web Cache may be monitored using either the Health Monitor screen or the Statistics screens. As displayed in Figure 17, the Health Monitor refreshes at a user-defined interval and displays the following statistics about Oracle9iAS Web Cache itself:


Time since start
Current time The time when the Health Monitor page view was generated
Oracle9iAS Web Cache start timestamp The timestamp when Oracle9iAS Web Cache was started
Time since start The length of time that Oracle9iAS Web Cache has been operating since it was started. Time is denoted in days/hours/minutes/seconds
The timestamp when Oracle9iAS Web Cache Accumulated number of requests Oracle9iAS Web Cache has served since it was started
Serving request/second Provides a graphical view of the number of Web browser requests per second resolved by objects in the cache that have expired or that have been invalidated (but have not yet been refreshed from the application Web server(s)), versus objects in the cache that are still valid
   

Table 7: Health of Oracle9iAS Web Cache

Figure 17: Oracle9iAS Web Cache Manager's Health Monitor Screen

The Health Monitor also displays information about the performance of the
origin Web server(s), including:

Up/Down Sate of the appcationWeb Server
Since How long the application Web server has been up or down
Total Requcsts Servwd Number of requests resolved by this application Web server
Average Latenty Average amount of time for the requests to be resolved

Table 8: Health of Origin Web Server(s)


The Web Cache Statistics screens display a more comprehensive set of data on both the Oracle9iAS Web Cache and the origin Web server(s). This level of detail is beyond the scope of this paper. Please reference the Oracle9iAS Web Cache Administration and Deployment Guide for further information on the statistics available in these screens.

For purposes of data mining and diagnostics, Oracle9iAS Web Cache captures
important statistical information in log files. Just as Web server do, Oracle9iAS
Web Cache records information on incoming HTTP requests in access logs.
Oracle9iAS Web Cache supports both Common Log Format (CLF) and Extended
Log Format (XLF) for access logging. The default CLF log format contains the
following fields:

c-ip Client's IP address and port
c-auth-id Username if the request contained an attempt to authenticate
date Dateat which the transaction completed
request line GET URI HTTP/1.0|1.1
sc-status Oracle9iAS Web Cache-to-client HTTP status code
bytes Content-length of the transferred document

Table 9: Default Access Log Format

In addition to the default fields, a number of other access log fields are supported by Oracle9iAS Web Cache. For more information on CLF and XLF logging formats, see:

http://www.w3.org/pub/WWW/TR/WD-logfile.html or http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html#co mmon-logfile-format

Access log analysis tools, such as Oracle9iAS Clickstream Intelligence, may be used to garner business intelligence from customer access patterns. These tools can help uncover traffic patterns on a Web site and other useful trends, such as

  • top vistors
  • top pages
  • top paths taken through the site
  • max sessions, etc.


DEPLOYING ORACLE9iAS WEB CACHE
In the simplest of deployment scenarios, Oracle9iAS Web Cache is positioned in front of one or more Web servers to cache content generated by those servers. Oracle9iAS Web Cache then delivers that content to Web browsers. When Web browsers access the Web site, they send HTTP requests to Oracle9iAS Web Cache which acts as a virtual server for the Web site, masking the existence of the Web server farm and the database. If the requested content has changed, Oracle9iAS Web Cache retrieves the new content from the Web servers according to the relative load on each server.

Figure 18: Simple Deployment Scenario


Co-located versus Dedicated Deployments
Oracle9iAS Web Cache can be deployed on the same node as the origin Web server or on a dedicated node of its own. Figure 19 demonstrates how Oracle9iAS Web Cache may be co-located with the Web server(s) on the same machine.


Figure 19: Oracle9iAS Web Cache on Same Node as Web Server

In a cluster scenario, a network load balancer distributes requests across the cache instances running on each node in the server farm. Each Oracle9iAS Web Cache instance is typically configured with the host name of the Web server(s) running on its own node, and interprocess communication (IPC) is used to pass requests between the cache and the Web server(s). Because Oracle9iAS Web Cache consumes memory, co-location is only viable if the cache and the Web server(s) do not contend for resources.

Oracle9iAS Web Cache may also be deployed on a dedicated node, as illustrated in Figure 20.


Figure 20: Oracle9iAS Web Cache on Different Node from Web Server


A dedicated deployment of this nature is often preferable to the co-located deployment previously mentioned. In a dedicated scenario, there is no risk of resource contention with other server processes. Also note that Oracle9iAS Web Cache performs excellently on commodity hardware, so a dedicated deployment need not be a costly one in terms of hardware expenditure. For very high-volume Web sites, and to avoid a single point of failure, two or more nodes running Oracle9iAS Web Cache may be deployed behind a third-party network load balancing device


Deploying Oracle9iAS Web Cache in a Distributed Network
For high availability, many e-businesses mirror their Web sites in strategic geographical locations. Figure 21 depicts a hypothetical mirrored topology in which Web servers are deployed in data centers in both the United Kingdom and Japan. Browsers make a request to local DNS servers to resolve www.company.com. The local DNS server is routed to the authoritative DNS server for www.company.com. The authoritative DNS server uses the IP address of the browser, the network topology model, and the current load at each mirror location to pick the ideal server farm to satisfy the request. It then returns the IP address of the appropriate mirror site to the browser.


Figure 21: Full-scale Web Site Mirroring


Caching is an excellent low-cost alternative to full-scale mirroring, especially when cache hit rates are as high as they are with Oracle9iAS Web Cache. Caching may also be used to serve local content to local markets in order to shorten response times to these markets. This deployment also reduces bandwidth and rack space costs for the content provider. In the example depicted in Figure 22, one Oracle9iAS Web Cache server is located in the U.S. office and another is located in the Japan office; the Web servers and databases for both offices are located in the U.S. office. In this way, the content management and content generation functions of the Web site remain centralized, while the content delivery function is distributed and localized.

Figure 22: Global Distribution of Oracle9iAS Web Cache


Oracle9iAS Web Cache with Content Delivery Network Services
Oracle9iAS Web Cache can be used in conjunction with content delivery network services. As described earlier in this paper, the distributed nature of these CDN services helps to mitigate Internet "hot spots" and offers a higher degree of fault tolerance and availability for a Web site's static and streaming content.


CDN services do not obviate the need for availability and scalability of a Web site's origin servers. Oracle9iAS Web Cache and CDN services are complementary. CDNs are ideal for delivering the less volatile graphics and streaming components of a Web site, while Oracle9iAS Web Cache is well suited for delivering and assembling the highly dynamic content.

With the advent of ESI, CDNs such as Akamai will begin to assume more of the processing and assembly of dynamic content. The ESI specification enables Web developers to deploy ESI-enabled applications both on ESI-compliant application servers (such as Oracle9iAS) and content delivery networks (such as Akamai) without re-writing the applications. Thanks to the open nature of ESI, Oracle9iAS Web Cache and ESI-compliant CDNs use the same markup language, cacheability syntax and semantics, and content invalidation protocols. Administrators may choose at deployment time which ESI content should be. processed by the CDN and which ESI content should be processed by the local Oracle9iAS Web Cache.

Figure 23: Oracle9iAS Web Cache with ESI-compliant CDN Services

With its blazing speed, ease of use, and unique ESI-compliant feature set, Oracle9iAS Web Cache is also an ideal caching solution for deployment by CDN service providers directly within their networks. Increasingly, independent CDNs, as well as global corporations designing enterprise CDNs, are deploying Oracle9iAS Web Cache as the cornerstone of their content delivery infrastructure.


Oracle9iAS Web Cache with Oracle9iAS Database Cache
Oracle9i Application Server includes both Web and database caching capabilities as part of its Enterprise Edition. Oracle9iAS Web Cache is Oracle's front-end solution for Web site performance, scalability and availability. In contrast, Oracle9iAS Database Cache is a mid-tier data cache, or database accelerator. The database cache stores read-only relational content on middle tier nodes in order to reduce the load on the origin database. Oracle9iAS Database Cache works transparently at the OCI-layer. No modifications are necessary to applications that access an Oracle database.

Together, Oracle's Web and database caching solutions turbocharge e-business Web sites in several complementary ways. For instance, Oracle9iAS Database Cache speeds up the retrieval of data when Oracle9iAS Web Cache experiences a cache miss. The first time a Web page is requested, Oracle9iAS Web Cache must pass the request on to the Web servers to process and format the response. Only after the response comes back is the Web page cached. With Oracle9iAS Database Cache on the middle tier, the response latency of these Web Cache misses is significantly reduced.

  Oracle9iAS Web Cache Oracle9iAS Database Cache
Benefit Improves Web Server Improves RDBMS Performance
Content Static, Dynamic files Relational Relational
Storage Hash Table Replication Table
Serves HTTP SQL
Propagation HTTP Performance

Table 10: Oracle9iAS Web Cache Oracle9iAS Database Cache

Oracle9iAS Web Cache provides another layer of infrastructure between the browser and the database. By offloading requests from the middle tier, it frees up Oracle9iAS Database Cache and the origin database to focus on highly computational tasks like personalization or shopping cart transactions. Oracle9iAS Web Cache also enables administrators to more finely tune Oracle9iAS Database Cache for greater consistency with the origin database.

Figure 24: Oracle9i Application Server Caching Technologies.


AVAILABILITY AND COMPATIBILITY
Oracle9iAS Web Cache is available as part of the Oracle9i Application Server, Enterprise Edition. For licensing information or to purchase the product, please visit the Oracle Store at http://store.oracle.com/.

A trial version of the full application server product, as well as a standalone install of Oracle9iAS Web Cache, may be downloaded from the Oracle Technology Network (OTN) at http://otn.oracle.com/. There is also a discussion forum on OTN for questions related to Oracle9iAS Web Cache.

Because Oracle9iAS Web Cache is based on HTTP 1.0 and 1.1, it is compatible with Oracle HTTP Server powered by Apache, as well as any HTTP-compliant Web server, application server, content management system or database.


SUMMARY AND FURTHER READING
Oracle9iAS Web Cache is the only Web acceleration solution to address the scalability and performance barriers faced by sites and applications that use dynamic page generation techniques. With its unique ability to cache both static and dynamically generated Web content, Oracle9iAS Web Cache takes the pressure off of busy Web sites by storing frequently accessed pages in memory, eliminating the need to repeatedly process requests for those pages on mid-tier servers and databases. As part of a complete Oracle9i Application Server offering, the Web caching component is ideal for accelerating high volume Web sites and other HTTP-based applications, including e-commerce catalogs, auctions, Internet and intranet portals, business-to-business exchanges, and CRM applications.

As the first and only application server on the market to support the Edge Side Includes (ESI) specification for performing page assembly in edge servers, Oracle9i Application Server leads the industry with its ability to deliver rich, personalized content from both the edge of the data center and the edge of the Internet.

Starting with only a single instance of Oracle9iAS Web Cache running on inexpensive hardware, a typical dynamic Web site can expect dramatic improvements in overall throughput. Better system throughput translates into shorter response times, higher scalability and significant resource savings, as measured by average cost per request. As the first product to combine caching. and load balancing for Web servers, Oracle9iAS Web Cache also provides surge protection and high availability features that help mitigate the effects of flash crowds. And Oracle9iAS Web Cache delivers this functionality on commodity hardware, meaning that e-businesses can now serve rich content faster, to more customers, using fewer computing resources than ever before.

Readers who wish to obtain more details about requirements, configuration, deployment and cache management are encouraged to consult the Oracle Web Cache Administration and Deployment Guide, which is available for download at http://otn.oracle.com


APPENDIX A: SAMPLE ESI CODE Note that much of the HTML formatting has been removed for the sake of clarity.

<html>
<head>

<title>
Company.com
</title>
</head>
<body>
...
<!-- The following HTML comment tag with an immediate following
'esi' is a special ESI tag that is removed if and only if this
page is processed by an ESI processor. -->

<!--esi

<esi:comment text="This is the HTML source when ESI is enabled."
/>

<esi:comment text="Start: The quick link section. You cannot use
the standard HTML comments because the end of that comment tag
would disrupt the HTML comment tag with 'esi' following the two
'-'. " />

<esi:comment text="The URI query string parameter 'sessionID' is
used to carry session identifiers.The session ID is encoded in
all links. 'type' is used to categorize this user."/>

<esi:vars>
<a
href="/shopping.jsp?sessionID=$(QUERY_STRING{sessionID})&type=$(
QUERY_STRING{type})">
<img src="/img/shopping.gif">
</a>
<a
href="/news.jsp?sessionID=$(QUERY_STRING{sessionID})&type=$(QUER
Y_STRING{type})">
<img src="/img/news.gif">
</a>
<a
href="/sports.jsp?sessionID=$(QUERY_STRING{sessionID})&type=$(QU
ERY_STRING{type})">
<img src="/img/sports.gif">
</a>
<a
href="/fun.jsp?sessionID=$(QUERY_STRING{sessionID})&type=$(QUERY
_STRING{type})">
<img src="/img/fun.gif">
</a>
<a
href="/about.jsp?sessionID=$(QUERY_STRING{sessionID})&type=$(QUE
RY_STRING{type})">
<img src="/img/about.gif">
</a>

</esi:vars>
<esi:comment text="End: The quick link section" />
...

<h3>Local Weather</h3>
<esi:include
src="/weather.jsp?sessionID=$(QUERY_STRING{sessionID})&type=$(QU
ERY_STRING{type})" />
...

<h3>Stock Quotes</h3>
<esi:try>
<esi:attempt>
<esi:include
src="/CompanyStack.jsp?sessionID=$(QUERY_STRING{sessionID})&type
=$(QUERY_STRING{type})" />
</esi:attempt>
<esi:except>
The company stock quote is temporarily unavailable.
</esi:except>
</esi:try>
...

<h3>What's New at Company</h3>
<!-- This section is a static file that does not carry session
information -->
<esi:include src="/whatisnew.html" />
...

<h3>Today's News</h3>
<esi:choose>
<esi:when test="$(QUERY_STRING{type}) == 'Sport'">
<h4>Sport News</h4>
<esi:include
src="/SportNews.jsp?sessionID=$(QUERY_STRING{sessionID})&type=$(
QUERY_STRING{type})" />
</esi:when>
<esi:when test="$(QUERY_STRING{type}) == 'Career'">
<h4>Financial News</h4>
<esi:include
src="/FinancialNews.jsp?sessionID=$(QUERY_STRING{sessionID})&typ
e=$(QUERY_STRING{type})"

</esi:when>
<esi:otherwise>
<h4>General News</h4>
<esi:include
src="/DefaultNews.jsp?sessionID=$(QUERY_STRING{sessionID})&type=
$(QUERY_STRING{type})" />
</esi:otherwise>
</esi:choose>
...

<!-- This is the HTML source when ESI is disabled. -->

<esi:remove>
Alternative HTML source that does not use ESI goes here. This
tag enables you disable ESI on the fly without redeveloping or
re-deploying a different home page.
</esi:remove>
...

</body>
</html>.

 

 

E-mail this page