|
Paged Result Sets with PHP and Oracle
by Harry Fuecks
Learn how to implement paged result sets, and display large ones efficiently.
Published November 2005
Paged result sets are a common technique for displaying query result sets efficiently. Perhaps the most well known example is the "Result Page" navigation at the bottom of Google search results pages.
In this HowTo you will learn how to implement paged result sets with PHP and Oracle, as well as efficient techniques for displaying large data sets.
The Need for Paged Result Sets
Unlike desktop GUIs, where you might hold a result cursor open in a session while a user scrolls up and down, Web-based applications are limited by the stateless request/response lifecycles of the HTTP protocol; once the request completes, the session is over. What's more, if you deliver thousands of rows in a single response wrapped in an HTML table, end users will obviously suffer--not to mention the underlying infrastructure.
The proven approach to solving this problem is to provide users with the abstract concept of a "page," which gives them a relative location in the result set. Page numbers can be passed via the URL, allowing the statelessness of HTTP to be preserved—a critical part of the shared-nothing architecture Rasmus Lerdorf describes in his article "Do You PHP?". On the server side, the requested page number is mapped to a corresponding sub-sequence of rows from the result set and then published to the user.
Sample Data
As sample data for implementing a paged result set in this HowTo, you'll use the Feedster Top 500 Blog data, which is provided online in an easy-to-parse format at http://top500.feedster.com/top500.tab. Store this data in the following table:
CREATE TABLE feedster_top_blogs (
rank NUMBER PRIMARY KEY,
name VARCHAR2(200),
url VARCHAR2(200),
links NUMBER
);
To implement a paged result set you need to involve all "layers" of your application. First build SELECT queries that return a result cursor containing only the rows for a given page, minimizing traffic between the Web and database servers. You also need a "pager" in the user interface to browse the result set. Finally, you need some logic that translates a page number into the corresponding row numbers in the result set.
User Interface
Let's say you want to allow users to use a "page number" to identify the section of the result set they want to see. The page number they have requested will be passed via the URL, such as http://example.com/results.php?page=5. This approach implies statelessness – anyone can point their browser to "page 5" simply by entering the correct URL. Of course if the underlying data is changing over time, the results that will be displayed for any give page will differ (see Read Consistency section below).
Meanwhile, to help users navigate the dataset, you need to provide a "pager" element in the display that offers links to other pages. Given that you want to support any number of pages—and there may be many pages—the pager element will need restrict itself to a "sliding window" of pages on either side of the current page in the dataset. This helps avoid burdening users with too much information in the display, as well as makes it easier to control the layout in a browser.
In this example you will display a maximum of five pages on either side of the active page. In addition to the "sliding window," links to the first and last page of the dataset are provided as well as links for the previous and next pages.
Expressing the pager "component" with a PHP function that draws it, the code below allows an arbitrary URL to be passed to the function, for which it will output HTML containing links to pages in the dataset:
<?php
function draw_pager($url, $total_pages, $current_page = 1) {
if ( $current_page <= 0 || $current_page > $total_pages ) {
$current_page = 1;
}
if ( $current_page > 1 ) {
printf( "<a href='$url?page=%d'>[Start]</a> \n" , 1);
printf( "<a href='$url?page=%d'>[Prev]</a> \n" , ($current_page-1));
}
for( $i = ($current_page-5); $i <= $current_page+5; $i++ ) {
if ($i < 1) continue;
if ( $i > $total_pages ) break;
if ( $i != $current_page ) {
printf( "<a href='$url?page=%1\$d'>%1\$d</a> \n" , $i);
} else {
printf("<a href='$url?page=%1\$d'><strong>%1\$d</strong></a> \n",$i);
}
}
if ( $current_page < $total_pages ) {
printf( "<a href='$url?page=%d'>[Next]</a> \n" , ($current_page+1));
printf( "<a href='$url?page=%d'>[End]</a> \n" , $total_pages);
}
}
?>
If you call this function with a dataset 50 pages in total and the current page being 15:
draw_pager('http://example.com/results.php',50,15);
It draws the following HTML:
<a href='http://example.com/results.php?page=1'>[Start]</a>
<a href='http://example.com/results.php?page=14'>[Prev]</a>
<a href='http://example.com/results.php?page=10'>10</a>
<a href='http://example.com/results.php?page=11'>11</a>
<a href='http://example.com/results.php?page=12'>12</a>
<a href='http://example.com/results.php?page=13'>13</a>
<a href='http://example.com/results.php?page=14'>14</a>
<a href='http://example.com/results.php?page=15'><strong>15</strong></a>
<a href='http://example.com/results.php?page=16'>16</a>
<a href='http://example.com/results.php?page=17'>17</a>
<a href='http://example.com/results.php?page=18'>18</a>
<a href='http://example.com/results.php?page=19'>19</a>
<a href='http://example.com/results.php?page=20'>20</a>
<a href='http://example.com/results.php?page=16'>[Next]</a>
<a href='http://example.com/results.php?page=50'>[End]</a>
This HTML will behave similarly to Amazon's A9 search page. In contrast, Google's implementation passes the starting row number, not an abstract page number, via the URL, while the "pager" behaves in a different manner—displaying links 1-10, 11-20, and so on rather than a range of pages that shifts based on the current page.
One problem with the draw_pager() function above is that it's not particularly clever about modifying the passed URL. If you were to pass it a base URL like:
http://example.com/results.php?orderby=name
It would generate URLs like:
http://example.com/results.php?orderby=name?page=5
Rather, it should produce URLs like the following if an existing GET query parameter has already been used:
http://example.com/results.php?orderby=name&page=5
To fix this problem you may find PHP's parse_url() function useful. But be sure to avoid appending &page to a URL that already has that parameter and validate or clean all input parameters before outputting them as HTML; otherwise, your application will be at risk for XSS exploits.
Page-to-Row Calculations
Bearing in mind the above pager, you now need to make two calculations: Determine the total number of pages and convert a page number into a row number. To do that, you need to decide on a value for the number of rows that should appear on a single page. (In a more complex implementation, you might also allow users to change this value.)
Having decided on the number of rows per page, the following PHP function, total_pages(), outputs the total number of pages:
function total_pages($total_rows, $rows_per_page) {
if ( $total_rows < 1 ) $total_rows = 1;
return ceil($total_rows/$rows_per_page);
}
Total rows would be a count over all the rows in the result set you are paging. For our sample data you know this is exactly 500 rows, but in a table where rows are being changed, you would need to ask the database for this value. (See Read Consistency section.) One possible addition to total_pages() would be to set a limit on the maximum number of pages you are willing to display. (See Maximum Pages section.)
To convert a given page number into row number in the result set, you can use the following function:
function page_to_row($current_page, $rows_per_page) {
$start_row = ($current_page-1) * $rows_per_page + 1;
return $start_row;
}
It returns the first row number in the result set corresponding to $current page.
Read Consistency
Our assumption here is that the paged display of the dataset is stateless, given the previous discussion of the stateless nature of HTTP and the Web. That said, if rows are inserted or deleted or updates made that change the relative order of a row while someone is browsing the dataset, inconsistencies may occur such as rows that appear to exist in multiple pages, data having changed between HTTP requests, and variations in the total number of pages.
For datasets that rarely change this may not be a significant problem; users rarely notice such inconsistencies. But for data that is volatile and for which it is critical that users receive a consistent view, you might consider using periodic "snapshots" of the dataset, perhaps passing an additional "snapshot time" via the URL. This approach would allow you to remain stateless but raises issues in terms of how much data to keep and for how long.
An alternative solution would be to store a static copy of the dataset with a users session, making the dataset stateful. This approach would enable you to give individual users a consistent view, without troubling them with implementation detail such as the age of the data set. At the same time, using sessions could quickly result in storage issues, as each user gets their own copy of the dataset for the period of their session—a problem if you have many users. One possible optimization, to minimize storage requirements, is to assume that most users will only be interested in the first few pages of the dataset, so only this data needs to be preserved.
Where you actually store the data will depend on your specific requirements and environment; in some cases it may be easiest to serialize a PHP data structure or make use of the Service Data Objects extension, while in others you might consider using multiple tables to preserve snapshots.
In general, you should avoid trying to implement datasets in a stateful manner unless you absolutely have to, as it will result in greater complexity in your application's logic, make managing the data more difficult, and increase the risk that your server will struggle under peak loads.
Row Counting
Given the above two functions, you need a SELECT statement to determine the total number of rows in the entire result set and another to return just those rows for the current page.
To count the number of rows a given SELECT statement would select, you can embed the statement in another which counts the rows, as the following PHP function demonstrates:
function count_rows(& $conn, $select) {
$sql = "SELECT COUNT(*) AS num_rows FROM($select)";
$stmt = oci_parse($conn,$sql);
oci_define_by_name($stmt,"NUM_ROWS",$num_rows);
oci_execute($stmt);
oci_fetch($stmt);
return $num_rows;
}
While use of COUNT may not seem like an efficient solution, knowledge of the total number of rows being returned by a result cursor is only available when you have iterated through to the final row. In other words, the alternative would be to SELECT the complete result set and loop through it, which is far more inefficient.
It's worth bearing in mind John Lim's tip in "Optimizing PHP and Oracle" of using a separate table to track the total number of rows via INSERT and DELETE triggers. Another solution, if you know that data will be changing infrequently, would be to cache the result on the Web server for short periods.
One limitation of count_rows() is that it assumes the passed SELECT statement does not need any parameters bound to it. Should you need to bind parameters—in particular those that would change the total number of rows&Mdash;you might consider extending it by accepting an hash of parameter names to their values as the third argument to count_rows(), which you loop over while calling oci_bind_by_name.
You may also want to have this function limit the value returned to a maximum value, to prevent paging right to the end of a large result set. (See Maximum Pages section below.)
Note that in some designs, if you know you will be dealing with large quantities of data, you might consider omitting the row counting logic. Bearing in mind you may set a limit on the maximum number of pages available (again, see Maximum Pages below), you might have count_rows() above return an arbitrarily high number for datasets known to be larger than 50 pages, while providing users with a message such as "Your search returned more than 50 pages of data."
Fetching a Page of a Dataset
To actually SELECT the rows you want for a particular page you can use a similar "SQL wrapping" function, for example:
function & paged_result(& $conn, $select, $start_row, $rows_per_page) {
$sql = "SELECT
*
FROM
(
SELECT
r.*, ROWNUM as row_number
FROM
( $select ) r
WHERE
ROWNUM <= :end_row
)
WHERE :start_row <= row_number";
$stmt = oci_parse($conn,$sql);
oci_bind_by_name($stmt, ':start_row', $start_row);
// Calculate the number of the last row in the page
$end_row = $start_row + $rows_per_page - 1;
oci_bind_by_name($stmt, ':end_row', $end_row);
oci_execute($stmt);
// Prefetch the number of rows per page
oci_set_prefetch($stmt, $rows_per_page);
return $stmt;
}
Note the two sub-SELECT statements here. The inner sub-SELECT first retrieves all the rows up to $end_row whereas the outer one filters out all rows before $start_row. This is most efficient way to access the rows in which you are interested, as it searches only as far as $end_row, rather first preparing the entire result set.
You may be wondering if it might be simpler to use a BETWEEN clause, such as:
$sql = "SELECT
*
FROM
(
SELECT
r.*, ROWNUM as row_number
FROM
( $select ) r
)
WHERE row_number BETWEEN :start_row AND :end_row";
The problem with this approach is that it first retrieves all rows in a result set, then applies a filter to reduce that set to the relevant, more narrow range, which is much more inefficient.
Notice also that I've called the oci_set_prefetch() function, passing it the number of rows per page. This helps reduce network traffic between the Web server and the database. (John Lim describes prefetching in more detail in the previously cited article.)
As with count_rows() above, this paged_result() also assumes that no parameters are to be bound. You can find implementations that solve this problem in both the ADOdb and PEAR::DB libraries.
Maximum Pages
The approach used above in the paged_result() function has a potentially serious drawback, however: In general, the further you go down the result set, the slower it gets, as it first has to locate $end_row and then filter out the rows before $start_row. If you have a large result set and a high-traffic site, the effect of users requesting very large page numbers could be tantamount to a denial of service.
To prevent this problem, you may want to consider having a fixed maximum page number, beyond which you refuse to provide results. For a search function this is easy to justify; users should clearly refine their search if it fails to find what they want after a few pages. Alternatively, providing column-sorting functionality can help users find what they need within a few pages.
Putting It All Together
If you have the above functions in a separate PHP script called pager_functions.php, using them to page through the Feedster data might look like:
<?php
$conn = OCILogon('scott', 'tiger') or die ("Unable to connect to db");
require_once 'pager_functions.php';
$rows_per_page = 20;
$url = 'feedster_top500.php'; // URL to this script
$sql = 'SELECT * FROM feedster_top_blogs ORDER BY rank ASC'; // The unfiltered SELECT
// Get the total page count from the number of rows
$total_rows = count_rows($conn,$sql);
$total_pages = total_pages($total_rows, $rows_per_page);
// Make sure the page number is a sane value
if ( !isset($_GET['page']) ||
!preg_match('/^[0-9]+$/',$_GET['page']) ||
$_GET['page'] < 1 ) {
$_GET['page'] = 1;
} else if ( $_GET['page'] > $total_pages ) {
$_GET['page'] = $total_pages;
}
// Translate the page number into a starting row number
$start_row = page_to_row($_GET['page'], $rows_per_page);
// Filter to a single page of rows
$stmt = & paged_result($conn, $sql, $start_row, $rows_per_page);
?>
<table width="600">
<caption>Feedster Top 500 Blogs [#<?php echo $_GET['page']; ?>]</caption>
<thead>
<tr>
<th>Rank</th>
<th>Blog</th>
<th>Inbound Links</th>
</tr>
</thead>
<tbody>
<?php while (OCIFetchinto($stmt,$row,OCI_ASSOC)) { ?>
<tr valign="top">
<td align="right"><?php echo htmlspecialchars($row['RANK']); ?></td>
<td>
<a href="<?php echo htmlspecialchars($row['URL']); ?>">
<?php echo htmlspecialchars($row['NAME']); ?>
</a>
</td>
<td align="right"><?php echo htmlspecialchars($row['LINKS']); ?></td>
</tr>
<?php } ?>
</tbody>
<tfoot>
<tr>
<td colspan="3" align="center">
<?php echo draw_pager($url, $total_pages, $_GET['page']); ?>
</td>
</tr>
</tfoot>
</table>
The pager is displayed at the bottom of the table, inside the <tfoot/> tag, as shown in the screenshot below.
Conclusion
You should now understand what elements required to implement paging result sets as well as potential performance issues. Furthermore, you now have sample code you can extend for your own applications.
Harry Fuecks [http://www.phppatterns.com] is a well known PHP developer and writer, since discovering PHP in 1999. He has published numerous introductory and intermediate PHP articles via the Sitepoint Web developer's network, as well as writing The PHP Anthology (SitePoint).
Send us your comments |