As with any Web portal, the server and database capacity needed to deploy a
portal built using OracleAS Portal largely depends on the number of
anticipated user requests for a given page. Displaying a single page to a user
may require many separate transactions, from verifying whether the user has
permission to view the page, to loading the images that appear on the page, to
calling a style sheet that contains formatting information for the page.
The upper and lower limits of what is needed are determined by how users are
expected to use the portal. At a minimum, enough server capacity
to satisfy the average load during a work day will be required, with response times that are
acceptable to the user base. If possible, strive to satisfy the
volume of page requests anticipated during peak intervals of high user
activity. Hardware resources such as CPU, memory, I/O capacity, and network
bandwidth are key to reducing response times. Unless installing OracleAS
Portal on a server or group of servers that can handle a large number of
transactions, users are probably going to experience slow response times.
The same is true of the database. If many applications compete for
the same database resources, Web portal performance may suffer. It is possible
to
install multiple instances of OracleAS Portal in the same database, for
example, a development instance for developing new pages and portlets, and a
separate instance for deploying the finished Web Portal. Consider
whether the database can satisfy requests from both instances in a timely
manner.
Adding more servers and database capacity will certainly improve the Web
portal's performance, but unless there are unlimited funds available,
balancing good performance against the costs associated with each
new piece of hardware and software will become key.
The initial sections of this document offer a high level overview of
performance and some of the elements of sizing that are important, the latter
section offers the recommendation and further considerations.
Performance Targets
Whether designing or maintaining a system, set specific
performance goals for optimization. Altering parameters
without a specific goal in mind can waste tuning time for the system without a
significant gain.
An example of a specific performance goal is
an order entry
response timeunder three
seconds. If the application does not meet that goal, identify the cause (for
example, I/O contention), and take corrective action. During development, test
the application to determine if it meets the designed performance goals.
Tuning usually involves a series of
trade-offs. After determining the bottlenecks, performance in some other areas
may need to be modified to achieve the desired results. For example, if
I/O is a problem, purchasing more memory or more disks may resolve that. If a
purchase is not possible, limiting theconcurrencyof
the system to users may achieve the desired performance. However, if there
are clearly
defined goals for performance, the decision on what to trade for higher
performance is simpler because the most important areas will have been
identified.
User Expectations
Application developers, database
administrators, and system administrators must be careful to set appropriate
performance expectations for users. When the system carries out a particularly
complicated operation, response time may be slower than when it is performing a
simple operation. Users should be made aware of which operations might take
longer.
Performance Evaluation
With clearly defined performance goals, determining when performance tuning has been successful
becomes a simple matter of comparison. Success
depends on the functional objectives established with the user
community, the ability to measure whether or not the criteria are being met,
and the ability to take corrective action to overcome any exceptions.
Ongoing performance monitoring enables maintenance
of a well tuned system. Keeping a history of the application’s performance
over time enables useful comparisons to be made. With data about actual
resource consumption for a range of loads, objective scalability
studies can be undertaken and from these predict the resource requirements for anticipated load
volumes.
Performance Terms
concurrency The ability to handle
multiple requests simultaneously. Threads and processes are examples of
concurrency mechanisms.
contention Competition for
resources.
cluster A group of machines that
handle workload in a distributed manner, providing redundancy and failover.
failover A method of allowing one machine or set of
machines to provide an alternative execution arena for a task, should the
original machine(s) fail.
hit The subsequent request for
a snippet of content from either the Portal Parallel Page Engine, or the client
browser - this content can take the form of images javascript libraries,
cascading style sheets etc. It is reasonable to expect ~30 hits from a single
page request
latency The time that one system
component spends waiting for another component in order to complete the entire
task. Latency can be defined as wasted time. In networking contexts, latency is
defined as the travel time of a packet from source to destination.
page request The unique request for a page defined
inside the Portal repository. A figure specifiying page requests per second is
the measurement of the load expected for the architected solution given a common
element of portal content. One page request is likely to result in one or more
(~30) hits for subordinate content.
response time The time between the
submission of a request and the receipt of the response.
scalability The ability of a
system to provide throughputin proportion to, and limited only by,
available hardware resources. A scalable system is one that can handle
increasing numbers of requests without adversely affecting response time and
throughput.
service time The time between the
receipt of a request and the completion of the response to the request.
think time The time the user is
not engaged in actual use of the processor.
stream time The time taken to transmit the
response to the requestor
throughput The number of requests
processed per unit of time.
wait time The time between the
submission of the request and initiation of the request.
Sizing the Portal System
Consider the following elements of page generation when planning for a system
sizing.
Peak page throughput required
Page cache hit rate
Peak login rate
Consideration should be given to other performance factors of the the final
built portal, these could include the portlet cache hit rate, portlet execution speed, page
complexity, page security, available network bandwidth and load distribution,
other portal activity, available hardware resources, amount and type of content
and the impact of using SSL.
Peak Page Throughput
The peak number of pages/second requested by Portal users. For example,
assume a Portal serves a total population of 10,000 users of which 10% are
active (a user may be logged in but not active) at peak times and an active user
makes 3 requests per minute. The peak throughput requirement will be
((10000 x 0.10) x 3) ÷ 60 = 50 pages/sec
Page Cache Hit Rate
The Page Cache Hit Rate is the number of page definitions that can be retrieved
from the cache compared to the number of pages that must be regenerated during
peak load times. To estimate the PageCHR consider
How often pages are modified by their owners
How often pages are customized by end users
Whether you will be using validation, invalidation or expiry based caching
The aim with PageCHR is to get it as close to 100% as possible, judicious use
of the correct caching policy within each unique page and portlet when weighed
against the dynamicism of the data will ensure that both the need to deliver
content in a timely and performant fashion and the desire to deliver the most
upto date content possible are met.
Building page content from cached content (both in-memory and on-disk) is
much less expensive than retrieving the content from the meta-data repository.
Peak Login Rate
The login rate is the rate at which users login to the Portal, thereby placing
a load on the SSO and OID servers. For example assume a Portal serves a total
population of 10,000 users and 20% of those users login during a 15 minute
period at the start of the business day. The peak login rate will be:
Explicit logouts will also place a load on the servers and may need to be
considered also.
Other Performance Factors
Portlet Cache Hit Rate (PortletCHR)
The PortletCHR is the number of portlet requests that can be satisfied from
the cache compared to the number of the portlet requests that must be handled by
a provider during peak load times
Portlet Execution Speed
The Portlet Execution Speed is the average time required to execute all (uncached)
portlets on a page. Since portlets execute in parallel, this measure will be
equal to the execution speed of the slowest portlet, plus any page assembly
overhead. The portlet execution speed may differ from site to site if each
site has a differing mix of content, caching policies and hardware for
their portal. Estimating this number can only be achieved through a proof of
concept that accurately reflects the eventual target data and page design. In
general the speed of page assembly will be limited by the execution speed of
it's slowest portlet.
Page Complexity
Page security and the number of tabs and portlets on a page will affect the
time it takes to generate page metadata. The number of portlets on a page will
affect the page assembly times especially if each page must be generated or
contacted for a validity check.
Network Bandwidth
The speed of the network that connects interacting portal components will
affect response times but does not affect in-machine throughput. Bandwidth
issues will be a large concern for portal implementations with a geographically
dispersed user-base. The further the content must travel the more latency
sensitive the delivery mechanism will be. Largely distributed systems over
higher latency networks will suffer from poor performance which will only be
exacerbated by dynamic portlet content.
Load Distribution
The distribution of system load across servers will affect overall system
performance. The normally accepted method of dealing with load distribution and
scalability is to place each AS component on a separate machine or machines
depending on how much scalability is required. This distributed load
architecture also assists when dealing with the issues of High Availability.
Other Portal Activity
The impact of the other users of the portal will affect overall performance
response. Content managers, developers and monitoring overhead can all consume valuable
processing resource that could be applied for page generation. This is a normal
situation as the nature of the portal provides for multiple concurrent usage
models, however, from a pure performance point of view the execution models for
page generation differ from that for application development and as such doing
both activities simultaneously will reduce the overall systems resource
available for one specific task.
Hardware Resources
Both page generation from the PPE and caching through webcache are memory
sensitive, in-memory operations are orders of magnitude faster than that
involving I/O bound disk caching or swap files. Providing suitable quantities of
memory for the Portal servers is a critical step in the machine configuration.
CPU Performance
Page generation is a CPU intensive process, therefore the speed of the
available CPU speed and quantities of those CPU's is another critical factor in
the machine configuration for a Portal server
Type of Content
The amount and type of content that is server could affect system throughput.
Multimedia content could place an additional load on the OHS, network
bandwidth, file system, memory cache and DB processes.
Sizing & Estimation Methodology
Estimating anything can be a complex and error-prone process, that's why it's
an 'estimation' not a 'calculation'.
Sizing portal software requires for there to be common denominator, in the
case of Database performance metrics we can refer to TPCC benchmarks. For J2EE
application server performance we can refer to 'Pet Store' transaction figures.
Unfortunately there is no 'Pet Store' for Portals, until the true unification of
portal development and deployment standards through the efforts of JSR168, WSRP
and other open portal development standards, it will be impossible to develop a
Portal Pet Store because of the variety of implementation methods employed by
the portal vendors in the marketplace.
Primarily there are three approaches to sizing a portal implementation, these can be
identified as:
Algorithm or Calculation based
An algorithm or process that
accepts inputs from the customer (e.g. user count, page count, hits, latency,
doc size etc) and attempts to deliver a processing requirement is probably the
most commonly accepted tool for delivering sizing estimations.
Unfortunately this approach is also the most inaccurate.
When considering a logical n-tier enterprise class portal implementation the
number of variables involved in delivering a calculation that even approaches a
realistic sizing response would require input values numbering in excess of one
hundred, and calculations so complex and sensitive that providing an input value
plus or minus 1% of the correct value would result in wildly inaccurate results.
The other approach to calculation based solutions would be to simplify the
calculation to the point where it was simple to understand and simple to use.
Unfortunately the sizing results delivered from this approach would also be
wildly inaccurate.
Size-by-Example based
A size-by-example (SBE) approach requires a
set of known samples that may be used as data-points along the thermometer of
system size. The more examples available for SBE the more accurate the
intended implementation will be. Asking a customer how many users they will
have, what those usage patterns are likely to be and what type of content they
intend to deploy on the portal are all questions that they should be able to
answer. Asking them the likely cache-hit-ratio for a portlet is probably
something they won't be able to answer unless you're asking the right people.
Normally in a pre-sales situation the customer will not know the answers to
questions like that. Those are the types of questions that would be required
for the algorithm approach.
Oracle has the ability to deliver targeted SBE sizing solutions for our
prospective portal customers through reference implementation documents that
outline both our internal deployments and customer's external deployments.
By using these real world examples both customers and Oracle can be assured
that the configurations being proposed have been implemented before and will
provide the performance and functionality unique to the proposed implementation.
Proof of Concept based
A proof of concept (POC) or pilot based
approach offers the most accurate sizing data of all three approaches.
A POC allows the customer to do the following :
Test their portal implementation design
Test their chosen hardware platform
Test their caching strategy
Simulate projected load
Validate design assumptions
Validate OracleAS Portal
Provide iterative feedback for their implementation team
Adjust or validate the implementation decisions made prior to the POC
There is, however, two downsides to a POC based approach, namely time and
money.
Running a POC requires the customer to have manpower, hardware and the time
available to implement the solution, validate the solution, iterate changes and
re-test and finally analyze the POC findings.
A POC is always the best and recommended approach for any sizing exercise, it
will deliver results that are accurate for the unique implementation of the
specific customer, and that are as close to deploying the real live solution as
possible but without the capital outlay on hardware and project resources.
Size by Example Opportunity
To provide the example implementation repository of size-by-example
solutions, we need to collate data from real world customers. If you feel that
you would like to take part in the Size-By-Example Survey then please read
this document.