Oracle + PHP

DeployPHP Series, Part 3: Accelerating PHP Code Performance for Oracle
by Ilia Alshanetsky

A guide to easy, effective techniques for accelerating your PHP applications

Downloads for this article
Oracle Database 10g
Oracle Instant Client
Oracle JDeveloper PHP Extension

Review complete DeployPHP index

In the past 10 years PHP has been winning the hearts and minds of the developers, becoming one of the most, if not the most popular scripting languages for Web development. At this point it powers over 20 million sites, ranging from small homepages to large corporate ecommerce applications. Much of PHP's explosive growth can be attributed to its ease of use, extensive built-in functionality and excellent documentation, which greatly simplifies development. Unfortunately, this simplicity has lulled many developers into the sense of complacency when it comes to performance, while PHP is pretty fast out of-the-box, without proper tuning it can be quite slow.

In this article, I will describe some of the easiest and most effective techniques for accelerating your PHP applications, which include but are not limited to the use of a script cache, proper Web server and PHP configuration and tuning, and benchmarking and profiling. With this information you will have the means to accelerate your PHP code as much as possible before resorting to hardware upgrades or acquisitions.

Using the Opcode Cache

One of the easiest and the most effective ways to improve the performance of PHP is to use an opcode cache. This tool allows the elimination of a particular inefficiency found in script execution.

To better understand the inefficiency and the offered solution, let's quickly examine the process through which PHP goes when it executes a script. As is the case in any scripting language, before code is executed it is parsed and converted from a human-readable form to a machine-understandable series of instructions. In PHP this process is performed by the Zend Engine Parser, which transforms a script into what are called opcodes. These opcodes are then passed to the executor, which interprets the instructions and executes the desired operations. However, most applications frequently load various PHP script libraries, classes, and so on; thus, the process of compiling and executing is repeated for all external components of the script and their sub-components.

In large or even medium-sized applications, this process may take quite some time. For example, if an application uses the PEAR database abstraction layer, DB to connect to an Oracle database, no fewer then four scripts will be loaded to perform this operation, requiring the parsing of over 150KB of code—application logic excluded. The PEAR libraries rarely change, and even the application logic code tends to remain fairly constant when it is pushed to production, and yet they and their included files are still reparsed on every single request. It is not unusual for a script to serve hundreds, if not thousands, of pages before being altered in any way. This process is particularly disadvantageous to large scripts that execute a relatively small amount of code.

Using the PEAR database abstraction layer again as an example, connecting and executing a query or two may only involve 10-15KB of code of the 150+ that were loaded. In such instances, the parsing work performed by the Zend Compiler will probably take far longer than the actual code execution.

This inefficiency is precisely what an opcode cache tries to eliminate. By caching the generated opcodes inside shared memory, the compilation step need only occur once. The opcode cache works by wrapping around the compiler and effectively assuming the responsibility of generating the opcodes. When the Zend engine requests that a script be loaded, the name is passed to the opcode cache, which uses a stat() system call to determine information about the file. The first bits of information used are the inode (An inode is special filesystem identifier that is unique for each distinct file on a particular device, ala partition) and device number, that create a unique identifier for each file. This allows the opcode cache to differentiate between files with the same name, without having to perform the slow path resolving operation to accomplish the same.

The identifying entry is then looked up inside the hash table to determine if this particular file has been encouintered previously and if so, when. If there is a cache hit, the modification time of the file, offered through the mtime parameter return by stat() call is compared against the one stored in the hash table. If they are the same, it means we have a valid cache data and it is fetched from shared memory and prepared for the executor. Otherwise if they are not equal or there is a cache miss, Zend Parser is called to generate an opcode on the file. The result is then cached based on inode and device number key associated with modification time and subsequently passed to the executor.

This process repeats for every included file just as it is during standard code execution. What is different is that within the first few minutes of operation, most scripts will have their opcodes cached in memory and no longer require reparsing—the only overhead involved in script execution will now be a fairly quick stat call and some basic normalization routines to make opcodes from shared memory usable. In comparison to that of the complete parsing process the overhead involved is negligible, making script execution significantly faster.

Other Opcode Cache Benefits. Elimination of the parsing step is not the only performance improvement gained by using an opcode cache. Other benefits include the reduced file I/O, because rather then reading a whole series of files from disk, PHP will in most cases simply retrieve the inode; the file contents can be fetched in the form of cached opcode from memory. This is a significant benefit, as memory speed far exceeds that of the disk and allows for much more rapid data access. It also means that other operations involving the disk such as writing log files, file creation, and manipulation by the script itself will be somewhat faster as overall load on the disk is reduced.

Another possible benefit is faster code execution due to an optimized or reduced opcode array. By default the Zend Compiler tries to convert the script to opcodes as quickly as possible to reduce overhead, thereby not generating an ideal set of instructions for the executor. Had it spent the extra time doing so, in most cases the benefits gained through faster execution would be offset by a significantly slower parsing process, making the complete script execution slower rather then faster. However, when an opcode cache is used, the compilation process only occurs once per modification, so it makes perfect sense to spend the extra bit of time on the first request to improve the opcodes, from which all subsequent requests can benefit. (One area in particular where this can be of great help is string handling, where quick data rearrangement can yield significant improvements Any practical PHP script will use strings, so an opcode cache can be quite useful here.)

The cache will significantly improve the performance of any script by eliminating the parsing process on every request, reducing file I/O, and optimizing the opcode array. Most applications will perform 30-40% faster on average, and some may even be as much as 200-300% faster. Usually, the higher numbers are indicative of scripts that load a lot of code and only utilize a small portion of it on every request.

The biggest attraction of opcode caches is that they are very easy to deploy, requiring absolutely no code changes. They generally do not require any esoteric libraries or packages. The only limitation of an opcode cache is that you can successfully deploy it only in environments where the PHP SAPI used is a web server module, such as the one available for Oracle HTTP Server and Apache or as FastCGI. The reason being that the opcodes are stored inside shared memory, and to prevent a shared memory segment from being freed by the system a process must be continuously attached to it. In the aforementioned SAPIs environment the hold on the segment is maintained by PHP, which initializes the segment when it is first loaded by the Web server during initialization. The latter means that if the Web server daemon is restarted, the opcode cache is effectively cleared and all scripts will need to be re-cached.

Opcode Cache Implementations. Quite a few opcode cache implementations alternatives are available as open or closed source. The open source solutions include the Alternative PHP Cache (APC), released under the PHP license (a variant of the Apache license) and available from the PECL repository, which makes it arguably the simplest to install. Turck MMCache, which had recently been forked as eAccelerator by a group of German developers, is available under GPL. On the close source front is PHP Accelerator, which was the first free opcode cache available for PHP, and the Zend Platform, which includes an opcode cache and optimizer.

The speed of the various cache solutions is about the same, although most tests show that Turck MMCache is the overall leader by about 5-10%, thanks to its highly-tuned caching mechanism and advanced opcode optimizer. Its main downside is that neither it, nor its fork eAccelerator, has completely stable PHP 5 support—which limits safe production usage to PHP 4. The second best performance is a tie between APC and Zend Platform; in some cases one has the advantage while others show the reverse situation. The difference however is so minor (within 1%) that you can pretty much consider them to be running at the same speed. The fluctuations can be explained by the fact that APC does not implement an opcode optimizer, while Zend Platform does—leading to the conclusion that APC has a slightly faster caching mechanism but that Zend's optimizer can make up for it in most cases.

The main advantage of APC is the ease of installation, requiring you only to run pear install apc as the root user. This command will download the latest stable APC sources and compile them, resulting in the generation of an APC Zend module that you can then load via php.ini. The other advantage of APC is its active community, which includes some big names like Yahoo! and which continually works on fixing and improving the code.

ZPS, on the other hand, already offers stable support for PHP 5, which makes it the only opcode cache capable of making that claim as of this writing. It also offers a very convenient interface to controlling the cache, which makes it a user friendly solution. Alas, it does come at a hefty price, which may put it out of reach for some developers.

On the bottom of the performance scale sits PHP Accelerator, which can probably be explained by the lack of any development since January 2003. Nonetheless, it offers binary compatibility with PHP 4.3.X releases.

Deployment. Deploying an opcode cache is a simple process. The first step varies among implementations, but the second step, which is common for all opcode caches, involves the loading of the caching module into PHP via the extension=apc.so directive. Since all opcode caches effectively modify the script parsing process and need to gain a hold of a shared memory segment they must be started during the PHP initialization process. This means that enabling and loading of the cache can only be done within php.ini and not from the dl() function or web server configuration. Aside from loading the extension it is generally possible to specify a few additional options, such as the size of the shared memory block allocated for caching purposes, which for APC can be done via the apc.shm_size directive. Other handy directives may provide a way to exclude certain scripts and/or script directories from being cached. (For example, in APC this task is done via the apc.filters configuration option.)

The final step for all opcode caches involves restarting the Web server daemon in order to reload PHP, loading the caching module, and initializing the shared memory block used for caching. All in all, the entire deployment process may take as little as five minutes—which includes a three -minute coffee break.

Web Server Configuration Tuning

Given that PHP is primarily used as a Web scripting language, it usually works as a module of a Web server such as the Oracle HTTP Server. Given its close ties to the server, you can gain several speed improvements through configuration adjustments.

Removing the header. One such optimization involves the removal of the "Server" advertisement header included inside every request. In the case of Oracle, this header may consist of these 95 bytes:


Server: Oracle-Application-Server-10g OracleAS-Web-Cache-10g/9.0.4.2.0 (TM;max-age=300+0;age=0)

Aside from the detailed description of the Web server, this header often includes information about various loaded modules, such as mod_php or mod_gzip.

So what purpose does this header have, you may ask, for serving content? The answer, surprisingly, is "none whatsoever." The browser could care less about what Web server software you are using and ignores this header completely. The information provided is completely transparent to the user and would only be visible if they choose examine the response headers that are part of the HTTP communication.

The presence of the header means that every single request sends X amount of unnecessary data, reducing overall network I/O and clogging the outbound pipe. This in turn leads to a reduction in page loading speed for all users of the Web site.

By disabling this header or reducing its size—which can be done by setting ServerSignature to Off inside the Web server's configuration file—the wasted bandwidth is regained. While this optimization may appear trivial, it adds up over time. A large site may easily serve up to 1 million requests per day and the header is sent for every single page, including static files such as images. Therefore the removal of 95 bytes can save approximately 2.7GB of traffic every single month.

.htaccess configuration. Folllwing several other optimization directives can help you reduce load on the disk, which in most cases is the slowest part of the system. By default, the Oracle HTTP Server—and also Apache, upon which it is based—allow you to set various configuration options on a per-directory level by placing a .htaccess file inside the directory. When it needs to serve a request, the Web server will sequentially go through all directories above the document root looking for this file in attempt to load configuration changes it specifies.

For a file accessible via http://www.site.com/a/b/c/d/script.php, no fewer then five .htaccess checks will be performed, one for every directory. On a popular site, this translates into hundreds of stat() calls every second. If the files are available, they need to be opened, parsed, and their configuration changes applied for every single request, causing an even greater loss in performance. Sound familiar?

However, you can easily make a few changes to avoid this performance loss yet retain the flexibility to maintain a distinct set of configuration directives on a per-directory basis. For starters, disable .htaccess configuration overwrite by setting AllowOverride to none. This change will disable the overwriting of configuration directives specified in the primary configuration file, httpd.conf, and stop the Web server from looking for the per-directory configuration files. To keep the existing per-directory configuration options previously specified, the .htaccess files are moved to the main configuration file directly (or to one of its includes) and the limited to a particular directory via the Directory directive of the httpd.conf. This configuration parameter allows configuration options to be set for just a particular directory and it's sub-directories in the same way .htaccess configuration parameters work.

<Directory /home/user/public_html/app/ >
ErrorDocument 404 /app/404.php 
</Directory>

The above configuration for example, demonstrates how the Directory directive can be used to move to the application specific ErrorDocument directive, specifying a custom 404 page from .htaccess to a httpd.conf. The main downside of this optimization is that to facilitate configuration changes, you will need to restart the Web server because the main configuration file and its includes are only read on server initialization. In contrast, the .htaccess file is parsed on every request, allowing configuration changes to take affect immediately.

Removing symlinks validation. You can further reduce file I/O by reducing the number of operations involved in opening a file or directory whose data is to be served. By default the Web server will check if any of the path components are symlinks by executing a lstat() system call. This means that for files such as /home/user/public_html/app/index.php, a total of 5 lstat() calls will be made, one per directory and one for the full path to the file. As with .htaccess checks, these calls can become quite expensive because the result of the operation is never cached and the process must be repeated for every single request, leading to a fair bit of file I/O overhead. Considering that most people create symlinks so that they can be used, removing this extra validation check by setting Options FollowSymLinks in the configuration file only makes sense.

Configuring the DirectoryIndex directive. Another file I/O trick involves the DirectoryIndex configuration directive used to specify the names of files that could be loaded when an index of a directory is requested. One common mistake made by many developers is to list every possible value—or even worse, a wild card. Consequently, when a user requests a directory index— http://www.oracle.com/, for example—the Web server will sequentially search for every possible directory index file until one is found in the provided list. If a such file is not available or found at the end of the list the number of stat() calls involved can be quite significant.

While it is impossible to do without this directive all together—after all, you probably don't want to show a file list when someone accesses a directory—you can optimize it by reducing the number of values listed in that directive and placing the most frequently occurring directory indexes at the start of the list. For example, the directive

DirectoryIndex index.php index.html

is optimized for primarily PHP-based configurations, where in most cases the directory index will be index.php and the fallback would be a plain HTML file called index.html. In this case you are guaranteed that no more then two stat() calls will be made to determine the index file, and in the event index.php is available, only one will be made.

Disabling request logging. The final optimization available in the Web server configuration is not applicable in all instances and should be used with caution. It involves disabling request logging by setting the output destination for the log file to /dev/null.

CustomLog /dev/null combined

By disabling the log file you prevent the Web server from having to perform a write operation into the log file for every single request. Instead, the data is written to a special character device, which throws the logging information away. However, this approach is not for everyone; generally speaking, it is only applicable to situations where static requests for images are logged for the purpose of bandwidth tracking, which can be more accurately and quickly gauged via other means. Disabling logging for all requests is usually not a good idea as it prevents traffic analysis and log-based security auditing. In some environments it may even be illegal if local laws require log retention for a certain period of time. Nonetheless, it is a valid optimization in some situations and can yield noticeable performance benefits, especially on large sites.

PHP Configuration

While tuning Web server configuration can indirectly improve PHP's ability to serve Web pages, adjusting PHP's own configuration can have some direct benefits.

Disabling the header. Like most Web scripting languages, by default PHP likes to advertise its presence by adding X-Powered-By header to every single request it serves.

X-Powered-By: PHP/4.3.10

This header also has no practical applications and is a waste of bandwidth. Fortunately PHP's configuration file does provide the means to disable this header by setting the expose_php directive to Off or 0. In addition to removing fluff from each request, by disabling this option you also disallow your scripts from being used to serve PHP's built-in Easter egg, the "special" April 1st logo, which is returned by adding ?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 to the script's URL.

Handling user input. Further configuration tuning pertains to the way the user input to the script is handled. By default, PHP sets the magic_quotes_gpc setting on On, which results in the escaping of any input data supplied to the script via get ($_GET) and post ($_POST) requests, cookie ($_COOKIE), Web server environment ($_SERVER), and system environment ($_ENV). The purpose of this feature is to prevent most forms on SQL injection for databases such as MySQL that until recently did not support prepared statements. Even then it is not the best approach; the MySQL extension offers a better escaping mechanism and numeric values can be validated through a simple cast to int or float.

For databases that support variable bindings, like Oracle, this step is completely unnecessary because the value will not be interpreted as anything other then specified type, which makes escaping unnecessary. Presence of non-valid characters—for example, single quotes in numeric values—will result in query failure but not lead to SQL injection.

Internally the escaping process is done by duplicating every value, examining the data one byte at a time to detect characters requiring escaping, escaping them if needed, and then resizing the memory block to the correct size. Given a significant number of parameters to a script or values containing large quantity of data, this process will be rather slow. To further complicate matters, when writing the data to files or using it inside the prepared statements where escaping is unnecessary the backslashes added by the escaping must be stripped. Failure to remove them via the stripslashes() function would mean that the stored data now contain extra characters, affectively leading to a corrupt input—leading not only wasted CPU and memory but to wasted time to undo its affects. To avoid this overhead, for performance reasons you should disable the magic_quotes_gpc configuration directive perform manual escaping in the few instances where it is necessary using prescribed functions for the situation.

Configuring the register_globals directive. Another input-related optimization involves the register_globals configuration directive, which despite being disabled by default is still enabled on many PHP configurations. When enabled this directive forces PHP to register every parameter provided via the previously described input methods as a variable. For example, if the

?foo=bar&baz=123 GET 

query string is passed to a script, in addition to populating the $_GET super-global PHP will create the $foo and $baz variables. While PHP is sufficiently intelligent not to duplicate the content of the passed input values created using copy-on-write, it still needs to create the extra variable placeholders. Given that these placeholders are strings, memory must be allocated for hash keys and a series of extra entries inside the PHP's variable hash table.

Perhaps even more disconcerting then the performance loss are the security consequences of having this option enabled. Given that the user now is able to inject variables into the scope of PHP, they gain the ability to set values to any uninitialized variables that can be used to launch any number of attacks against the code. Therefore, you should disable this directive and use only super-globals such as $_GET, $_POST, and so on to access input data.

Configuring the variables_order directive. The last input related setting to consider is the variables_order directive, which indicates what inputs to register in the form of super-globals. By default, this directive includes "E" values, which leads to the creation of the $_ENV super-global used to access environment variables. The interesting thing is that very few scripts need to use the environment variables and when doing so only need one or two, which can be still be accessed via the getenv() function. Thus registering $_ENV super-global with 15-20 values of varying length on every request is not only pointless but also a waste of processor time and memory. Instead, you should set the value of the variables_order php.ini to only contain GPCS, excluding system environment variables from the list.

Configuring the session.use_trans_sid. Another serious performance drawback involves a commonly used php.ini option, session.use_trans_sid, which if enabled facilitates the automatic addition of session ids to all internal links and forms. The goal of this feature is to simplify the process of ensuring that the session id is always a part of the request, so that URL based sessions can work properly. To make this functionality possible, PHP needs to do quite a bit of extra work, which is not conducive to high performance.

The first thing PHP does when this option is enabled is to buffer the entire page's contents in memory, rather than outputing the data directly. PHP has no way of knowing what will be the final size of that page, so it cannot statically pre-allocate a buffer to store the data. Instead it creates a small starting buffer and increases it as the amount of script output goes up, which translates to a fair bit of memory reallocation. When the script finishes its execution, PHP takes the output stored in the buffer and passes it through a parser that examines the output for links and forms and adds the current session id to them. This process is not only memory intensive, but takes a noticeable toll on the CPU as well. Instead, you should append session ids to links and forms manually during script writing. Alternatively, cookie-only sessions may be used, which eliminates the need for this entire operation.

Configuring directory paths Another session setting of interest is the session.save_path, responsible for defining the storage location of the session data. When the default session handler, files, is being used, by default PHP will store each session as a separate file inside the system's temporary directory, /tmp. While this approach is fine for small sites with relatively few visitors, for larger sites it may pose a serious performance problem— as the number of files inside a directory grows, the access to those files becomes progressively slower for most filesystems. The ext2 filesystem is particularly vulnerable to this problem and becomes quite slow when a directory's file count goes over a few thousand. This number may seem fairly high until you consider that the system temporary directory is used by many other applications, and may even have sessions created by other PHP scripts.

One solution is to specify an alternate session storage directory for each application by setting a path to that directory prior to session initialization.

session_save_path("/home/user/sessions/appX/");
session_start();

The added benefits of this approach include lower probability of name collision resulting from two scripts or requests generating the same session id for different users. It also makes it a bit more difficult for local users to see active sessions, since session ids and data are no longer located inside the world readable /tmp directory.

But even directory separation may not be sufficient in all instances, especially if sessions are allowed to stick around for a long time. In this case, you can take another step to avoid large accumulation of files inside a single directory: Instruct the session extension to create a directory tree based on the first few characters of the session id, for the actual storage directory path.

The number followed by the semicolon indicates the depth of tree, so in this example, a three-level deep directory tree is created.

session_save_path("3;/home/user/sessions/appX/");
session_start();
// if session id is 6c178dc19a034137e63fe8d50292df11 
// its location will be
// /home/user/sessions/appX/6/c/1/sess_6c178dc19a034137e63fe8d50292df11

As the session id is a hexadecimal number, where each character can be one of 0-9a-f, it provides 15 alternates for each level. For a three-level deep structure it would mean session data would be more or less equally split across 4096 (16^3) directories, which should ensure a reasonable amount of files per directory, even for a large site. If even this proves insufficient, the value can simply be increased, providing 16 times more directories for every level.

I/O Matching

One of they key aspects of content serving via PHP is its delivery to the user—otherwise, what's the point? What many developers forget is that PHP does not directly communicate with the users' browsers; the data is actually filtered through a number of processes before it trickles through to the TCP/IP protocol, eventually making its way to the user. In most instances when PHP performs an operation resulting in the output of data the output is first delivered to the Web server, which passes the data along to the operating system that then transmits the data over TCP/IP to the user.

A performance issue arises from the fact that each "write" operation between PHP, Web server, and finally the OS is not especially fast. If a script, like most scripts, outputs data in small chunks it can result in a large number of system calls. To reduce the number of writes, PHP offers something called output buffering, which is enabled by default. The output buffer allows PHP to collect the output of various write operation into blocks, which by default are set to 4,096 bytes (4KB) and only send the data once a block is full. This means that when outputting a 48KB page, PHP would need a mere 12 write operations to deliver the content to the Web server versus possibly hundreds of writes had the data not been buffered.

The buffer size is preset, which prevents the memory reallocation overhead we've encountered when automatically appending session id to URLs. But even 12 writes is 11 writes too many, so for large pages it may be prudent to increase the buffer size by changing the value of the output_buffering setting inside php.ini. This way the entire page can fit into the buffer, requiring only a single write operation to deliver the data to the Web server.

// inside PHP.ini
output_buffering=50K

// inside httpd.conf or .htaccess
php_value output_buffering=51200

You can force the buffering of the output from within the script by calling ob_start() function at the start of the program. If not called at the onset of the script, any output sent prior will be delivered directly to the Web server without being buffered.

This approach is not as efficient as one offered through buffer size specification available through the output_buffering INI setting. Rather then being able to pre-allocate a large buffer at the onset of the request based on the average page size, PHP would need to continually resize the buffer to adjust for the growing output size as it has no clue about the approximate size of the final output. This means that every time the default 4KB buffer is filled it would need to be resized. This process may be repeated many times depending on the size of the output, leading to an undesirable number of memory operations.

The Web server itself also wants to reduce the number of system calls involved in the delivery of data to the OS and implements its own buffering. The size of this buffer is set to 4KB by default, which means if that if provided data is given in small chunks it will be allowed to accumulate until the buffer is filled and only then delivered to the OS. If the supplied data is greater than the buffer size, it will be split into buffer size chunks to be delivered individually. On the other hand, if PHP buffers data internally using the same buffer size as the Web server, the data can now pass directly to the OS without any interference or overhead via a single writev() system call.

Given our desire to optimize the process, for applications delivering large quantities of data it is may be prudent to increase the size of the Web server's buffer to prevent large PHP buffers from being split and to reduce the number of writes. You can do that by adjusting the Web server's buffer size via the SendBufferSize configuration directive.

SendBufferSize = 53248

The above setting would set the buffer to 52KB, which is slightly large then the PHP's 50KB buffer used in the previous request. Usually the Web server buffer should be slightly large to accommodate the various headers that will be added in addition to the page content. This approach will ensure that a complete response to a request can be sent in one shot, rather then requiring multiple write operations. There is no generic correct value, since each site and application has its data transfer requirement. Developers and administrators needs to determine the "right" value for their situation by analyzing the web server logs, which contain information about how much data was sent as a response to each request.

The added advantage of sending data in one large chunk is the fact the Web server no longer needs to wait for the OS to deliver the data. If small buffers (or no buffers) were used, it would need to wait for the OS to indicate that the prior data has been written before sending more information, resulting in further performance loss. With a single buffer containing the entire page, the data can be handed off to the kernel and the Web server can proceed to generate data for other users rather then having to wait for delivery confirmation. On a high-traffic server this approach leads in turn to an overall drop in the number of processes needed to handle incoming requests, a smaller number of active processes, and lower memory utilization, which of course translates to improved performance.

Buffers also exist in the OS, of course. Just like PHP and the Web server, the kernel has no desire to perform hundreds of writes to transmit information. The OS buffers determine how much data can be sent via TCP/IP at a time, which is why it is important to ensure that these buffers match the ones set by the Web server.

For example, you can adjust the kernel's send buffers on Linux systems via the tcp_wmem directive, which establishes low, average, and maximum buffer size.

/proc/sys/net/ipv4/tcp_wmem

51200		131072	204800

This option can be modified in two ways, the first primarily intended for testing where this value will need to be changed frequently is done by using echo facility of the bash shell

echo "51200 131072 204800" > /proc/sys/net/ipv4/tcp_wmem

The values are listed in the order you want them to appear inside the setting and separated by a single space. Another way to change those values is by setting them inside /etc/sysctl.conf, which specifies settings to be set on the system on every boot. Once the ideal values are determined, it is recommended to set them inside this configuration file, rather then adding the previously mentioned line to one of the boot-up scripts.

net.ipv4.tcp_wmem = 51200 131072 204800

Adding this line somewhere inside the sysctl.conf file is all it takes to set the wmem values for your system. One thing to make sure is that there are no identical directives elsewhere in the file, which may reverse the operation, in particular if they are listed after your directive.

The first number indicates the starting value of the write buffer, which will be allocated as soon as the TCP socket is created. Ideally you'd want to set it to the average page size delivered by the Web server. This will ensure that no further buffer operations will need to occur for the connection, unless the page size is greater then average. The second value indicates the size the buffer may grow to without being hampered by the system, even during instances where the available memory is low. Beyond this value the OS may decide not to increase the buffer size if the system is currently under a high load; a good number for this option would be the maximum page size. The final number indicates the maximum possible buffer size for a TCP socket, intended to prevent memory exhaustion due to allocation of massive buffers for the purpose of delivering large files such as images.

A related TCP/IP setting that should be set is the tcp_mem, which defines how the TCP stack should behave when it comes to memory usage. Like the tcp_wmem directive it comprises three values, indicating the minimum, average, and maximum. The meaning of these values, however, is a bit different. They are used to set the overall limits on the TCP socket buffer utilization, which means that they must account for the number of concurrent sockets active on the system. Also, unlike for tcp_wmem, the size is calculated not in bytes but in memory pages, which typically represent 4,096 bytes each. The precise size of the page can be determined from the getpagesize() or the sysconf() system call in a C program:

#include <unistd.h>
int main() {
	printf("%ld\n", sysconf(_SC_PAGESIZE));
	return 0;
}

The first value for tcp_mem indicates the low threshold up until which the kernel will not try to restrict buffer utilization; ideally it should match the second value specified to tcp_wmem indicating the maximum size of a page multiplied by the maximum number of concurrent requests divided by page size (131072 * 300 / 4096). The second value indicates the memory utilization at which the kernel will try to pressure the memory usage down; ideally this value would be the maximum overall buffer size the TCP socket may use (204800 * 300 / 4096). The third and final value represents the maximum buffer limit. If it is reached further TCP connections will be rejected, which is why it is important not to make it terribly conservative (512000 * 300 / 4096). In this instance the value provided is quite generous, it would handle 2.5 times as many connections as expected or allow existing connections to transmit 2.5 times or data.

When setting these limits it is important to keep two things in mind: first, the number of concurrent TCP sockets will always exceed the number of Web connections; many other system processes such as connections to the Oracle database will also result in creation of sockets. This means that the buffers limits should account for the presence and the needs of those connections as well. For example, while there may be only 300 simultaneous Web users accessing the site, the number of sockets can easily be as high as 700 as each connection would also involve a database socket and other system processes like SMTP/POP3/IMAP using sockets as well. It is also important to keep in mind that other socket connections may be just as common as HTTP and if they transfer more information, perhaps their send sizes should be used to determine the ideal values. Otherwise, while your HTTP traffic may go through smoothly database communication, also primarily socket based will be sub-optimal due the need to split the transmission into multiple segments.

Second, some connections may stick around for longer then a second, so a few thousand sockets can be active at any one time. To accommodate this number, the above values should be multiplied by roughly 15-20, depending on the lifetime of an average connection.

In our example the /proc/sys/net/ipv4/tcp_mem settings would be:

192000	300000	732000

The end result is optimized content delivery to the user through a significantly reduced number of write operations.

Aside from improved server performance, buffering benefits the user experience through faster page loading. Most modern browsers will re-render the page any time they receive even the tiniest bit of data to allow the user to see the content of the page as quickly as possible. When the page is sent in many small chunks, the number of time the page needs to be rendered can be quite large putting an undue strain on the user's machine, which makes content loading appear slower. However, when page is sent as a single large block of data, the browser may only need to render it once, allowing it appear that much faster.

The one possible downside of buffering is the increased memory usage caused by PHP, the Web server, and the operating system. On low-memory systems it is entirely possible for the various buffers to exhaust all available memory, forcing the usage of swap, which results in much slower overall operation. When calculating the buffer sizes it is important to account for total available server memory and make sure enough is left for other, non-buffer related operations. Fortunately, on modern servers systems where RAM capacity is measured in gigabytes, this rarely presents a problem.

Benchmarking & Profiling

Before starting any code optimization it is important for you to understand where the bottlenecks are in your application. In the absence of this information there is only guesswork, and such assumptions can lead you to waste time and resources to solve an inefficiency that may actually only represent a small portion of the problem.

In contrast, by benchmarking and profiling the application you can get accurate information about what parts of the site/application are slow and then analyze the responsible scripts to determine where the bottleneck is. The initial snapshot provided by this process is also important information that allows you to gauge the current state of the application and determine your performance goals.

After each change you should examine the performance again to determine if it has improved. Many performance-inspired changes such as buffer adjustments may in some cases lead to a performance drop if not set correctly, so it is absolutely imperative to verify that the changes were in fact beneficial.

Benchmarking methods. For Web-based applications the simplest way to benchmark performance is by simulating anticipated user traffic and seeing if the server can cope with the load. Because site traffic is never linear it is usually a good idea to test the worst-case scenario—for example, a doubling of the load during the peak hours, which may result from a new product release or a favorable review.

A number of tools are available for the purpose of simulating requests, the most popular of which is Apache Bench, usually in the form of the ab utility available on most *NIX-based systems. This tool provides a very simple interface for sending many requests to the server, with the ability to make it appear that the requests are coming from different users simultaneously. For example, the command

ab -n 10000 -c 10 http://localhost/

would result in the execution of 10,000 requests at a concurrency level of 10, which would be a fairly good approximation for a site that expects to serve 10 concurrent users at any one time.

The report generated by ab will include a slew of interesting information, such as the actual number of requests per second served, which ideally is greater then the concurrency. (If the number happens to be less, the server cannot cope with the load associated with serving 10 concurrent instances of the page.) Other useful information presented in the report includes the size of the output—a very helpful value to determine the buffer sizes for PHP, Web server, and TCP stack. The report also indicates total number of bytes transferred and the average transfer speed, which you can use to see if the network pipe is being saturated. For example, a 1.5MB/s rate on a 10MB pipe would mean that the bottleneck is not really the code but rather the outbound network connection.

There are also various timing averages that break down the request time even further. For example, if for some reason establishing a connection takes longer then processing, the Web server does not have a sufficient number of children to handle the incoming requests, causing such processes to be created in real time. Similarly, if the count of failed requests is high (an important sata point), the load may be causing requests to fail or be rejected.

An important consideration when benchmarking a site or an application is to take the time to analyze all pages, not just the ones suspected of being slow. One common mistake is to benchmark just the front page and perhaps a few pages leading off of it, when in fact the performance loss is probably caused by a rarely trafficked page. Furthermore, when working with PHP scripts it is important to remember that different inputs may result in alternate outputs that do not necessarily take the same amount of time to produce. To avoid missing possible bottlenecks, you should not only test every page but also test them using various inputs that may lead to alternate data being generated.

Profiling methods and tools. When a slow script or even a series of scripts is identified you may not know why they are slow, only that they are. This is where the second performance analysis tool comes in: the profiler. A PHP profiler is a Zend module that sits around the executor and tracks all the function calls. As part of its tracking process it analyzes how long each function took to run, how many times it was executed during the script's execution, and who called it. If the php.ini directive memory_limit is enabled, it will also track and report on the change in script's memory usage after the function's execution. This information is subsequently written to a file or a similar data store that you can use later to produce a detailed report. The generated reports can then clearly identify the slow parts of the script and in many instances clearly identify the cause of the problem. For example, if OCIExecute) takes over a second to execute, it likely is the result of an unoptimized query that forces the database to perform full table scans.

As with opcode caches, several PHP profiling tools are available. Two open source solutions, DBG and XDebug, are frequently found in various IDE packages and integrate a profiler in conjunction with a debugger, which is their primary feature. A pure profiler for PHP is available in the form of APD, found inside the PECL repository, which makes it extremely easy to deploy. And Zend Studio offers a profiler as part of its general development suite.

Using a profiler such as APD is a four-step process which begins with the installation of the tool by running the pear install apd command as the root user. This command will download the latest stable release of APD and install it . This module now should be loaded into PHP by adding the zend_extension = /path/to/apd.so directive to php.ini. Another important setting that should be specified is apd.dumpdir, which indicates where the profiles generated by APD should be placed. By default they will be stored inside the /tmp directory.

Now the Web server software needs to be restarted for the PHP configuration changes to take affect. At this point you are ready to start profiling your application, which is done by placing the apd_set_pprof_trace() at the top of the script. The function call begins the profiling process from that point in the script, which is very useful for large scripts where only a portion of code is to be examined.

Upon request completion, the profile is written to the specified dumpdir as a separate file, whose name will be based on the process id (PID) of the Web server process used to handle the request. The content of this file is not intended for human consumption, which is why APD offers a pprof tool used to generate various types of performance analysis reports based on this dump. The different reporting modes are specified through a series of switches to the utility—for example, calling pprofp -u /tmp/apd/pprof.1234.0 would generate a report on the functions that have taken the most user time to execute at the top.

         Real         User        System             secs/    cumm
%Time (excl/cumm)  (excl/cumm)  (excl/cumm) Calls    call    s/call Name
-----------------------------------------------------------------------
 33.3  0.02  0.02   0.02  0.02   0.00  0.00     7   0.0029    0.00 require_once
 33.3  0.01  0.01   0.02  0.02   0.00  0.00    55   0.0004    0.00 sprintf
 33.3  0.00  0.00   0.02  0.02   0.00  0.00   144   0.0001    0.00 feof
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.00 htmlspecialchars

The different time specifications allow you to see the overall time spent in a particular routine (Real time) versus the time spent performing system calls (System time) and general processing operations (User time). This distinction is useful for figuring out whether the application is hitting an I/O bottleneck, at which point most of the routine's time is spent executing system calls.

A high User time indicates a slow or complex non-system related operation, such as execution of a regular expression on a large string. In most cases tuning inefficient usercode is much simpler then addressing I/O bottlenecks, so this information is helpful in prioritizing possible fixes.

Another useful mode includes -t, which displays a compressed call tree demonstrating the flow of the function calls inside the script. This is a very handy tool when you need to figure out why and where a particular function is called hundreds of times (such as feof() in our example.)

Database Performance

Most PHP applications rely on a database for information storage. When used correctly, a database provides a highly efficient and effective way to store data and can scale exceptionally well. However, improper usage often results in some of the heaviest losses in performance inside a script, which is why tuning is imperative.

The steps to working with a database involve establishing a connection by opening a socket to the database server and passing authentication information that requests access to a portion of the data. The database then needs to validate this information, compare it to its permissions settings, and determine what if any access privileges to grant to the connecting client. In some cases the process may involve establishing an encrypted channel resulting in yet another series of operations. This fairly complex and not especially quick process is repeated in every page request in the course of whose fulfillment a database is queried. For feature-complete databases such as Oracle, starting a new database session for each request can incur significant penalties. In fact, if the script ends up executing only a quick query or two, the connection time can take up as much as 30% of the total database communication time.

Fortunately, PHP offers a workaround that allows the code to avoid having to open a new connection on each request, thus preventing connection initialization overhead. This is done by changing the connection function from ocilogon() to a persistent connection function ociplogon(). When the latter is being used, before opening a connection PHP will check its internal resource table to see if an Oracle database connection using the same authentication information is already available. If it is, this resource will simply be returned avoiding further operations; otherwise a new connection established and added to the internal resource table in association with the authentication data. Because the connection has been marked persistent it will no longer be automatically closed on script termination, thus allowing the subsequent requests to reuse it.

This optimization is not without a few "gotchas." In PHP, persistent connections are not shared across the entire server instance; instead they are kept on a per-child basis, which means that eventually every Web process will get to serve a database request and end up with a persistent connection of its own. On a Web server with 100 active children this scenario may result in 100 or more active database connections.

Why more? Well, the connection resource is associated based on authentication information, so if the content management system uses one set of privileges and the e-commerce app uses another, you may now have 200 active connections. Because the number of concurrent connections to database may be limited, these persistent connections may prevent other process from communicating with the database. To further complicate matters, the PHP function to terminate these connection, ociclose(), currently is an null-op and has no effect, which means the only way to close the connections is to restart the Web server.

Another possible problem with persistent connections pertains to locks or transactions. When a regular connection is closed any current locks are released and uncommitted transactions automatically rolled back. However, this process does not automatically occur for persistent connections, where the lock or transaction may stick around indefinitely because the connection is not terminated. In the case of a leftover write lock, it would mean denial of access to other processes trying to access the locked table or rows.

Fortunately this problem can be avoided by writing careful code that at the end of the script would confirm that there are no pending uncommitted transactions.

function oci_safe_close() {
	if (defined('oci_conn') && oci_conn) {
		ocirollback(oci_conn);
	}
}
register_shutdown_function("oci_safe_close");

define('oci_conn', ocilogin("user", "pass", "db"));

This bit of code registers a callback function that will be called anytime a PHP script ends, regardless whether it's due to the script reaching the end or it being terminated due to the user aborting a request. The function checks if an Oracle connection is set, by confirming existence and validity of a constant where it is stored. If the connection is valid, the ocirollback() function is called, which will rollback any uncommitted transactions, preventing deadlocks.

The biggest performance drop in databases, however, are not slow connections but rather slow queries that do not use available database tools for more rapid data storage and retrieval. The most common of these problems is a lack of indexes, which means that the data retrieval must perform a full table scan to fetch the desired rows. Alternatively there may be too many indexes, which slow down insertion by forcing index rebuilds and forcing the database to examine multiple indexes on inserts. The problem is compounded by the fact that during development these problems may be hard to spot, as with little data virtually any query would execute instantly. The issue tends to rear its ugly head during production when the data set keeps on growing and lack of indexes soon translates to very slow page load times.

Oracle users are fortunate to have a tool for analyzing the operations involved in executing the query and a summary of the rows affected by each one. From within the SQL*Plus interface it is possible to trigger the analysis of each executed query by setting autotrace to on. When this is done, each executed query will be followed by the output EXPLAIN PLAN, detailing the internal operations performed in order to fetch the desired data. Internally this information is being stored inside the PLAN_TABLE table, whose structure can be found inside UTLXPLAN.SQL script located in the $ORACLE_HOME/rdbms/admin directory. (For information about performing a similar analysis via the Oracle Enterprise Manager GUI, see this documentation.)

The contents of this table comprises of six columns: id, operation, name, rows, bytes and CPU cost. While the id field is not particularly useful, all other columns contain pertinent information for performance tuning purposes. The operation column contains the description of the operation, for example TABLE ACCESS FULL, while the name column would contain the name of the table involved in the operation. The rows column represents the numbers of rows affected by the operation, while the bytes column indicates the size of the data being manipulated. The final column, cost, indicates the processing time spent on executing a particular operation. (As you can probably guess the lower this value, the faster the query.)

To facilitate proper operation of query tracing inside SQL*plus, another script, PLUSTRCE.SQL (found inside the $ORACLE_HOME/sqlplus/admin directory) needs to be executed to permit SQL*Plus to visualize the reports. Without the structure provided by this script and the previous one, autotrace will fail to visualize the information about the query. Assuming both scripts were run and query tracing enabled via the execution of the set autotrace on command, each query output will be followed by the execution plan and statistics detailing the operations performed and data transferred. With this analysis tool in place, once the script profiler shows you an ociexecute() function call that takes longer then a few seconds, you can take its query and profile it to determine the cause of the performance drop.

The specific things to watch for are queries that involve far more rows then are being retrieved, which usually indicates missing or improper implemented indexes. The basic rule of thumb when tuning your queries is that the fewer operations are needed, the faster the query will execute. At the same time, try to reduce the amount of data that needs to be analyzed and passed internally inside the database as well as sent back to PHP.

For a more complete discussion about the proper use of indexes with PHP, see the first installment in this series or the Oracle Database 10g Tuning Guide.

Conclusion

Many more tricks exist which may lead to further speed improvements, but beware: the road to ultimate performance can be endless. There will always be one more adjustment or setting revision that could yield even better results, which makes it very easy to get carried away and spend inordinate amounts of time tuning, fine-tuning, and retuning.

For that reason, you should always set a reasonable performance goal in advance. It is also important to remember that in most cases, hardware costs pale in comparison to developer ones. In many situations after you have applied basic optimizations such as introduction of an opcode cache and query optimization, it is simpler, faster, and cheaper to simply increase computing capacity by adding another server or upgrading existing ones.


Ilia Alshanetsky is PHP Core Developer responsible for the development of a large number of extensions, general improvements in the language through security fixes, performance enchantments and generic bug fixes. He is the principal of Advanced Internet Designs Inc., a company responsible for the development of FUDforum, a high performance and security bulletin board software written in PHP. He is also the author of the Zend Certification Training and Professional PHP Development courses, which he frequently teaches.


Send us your comments

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy