Author: Wim Coekaerts, Director of Linux Engineering
Date: January 2004
Numerous customers have started migrating from Red Hat Enterprise Linux 2.1 Advanced Server (RHAS2.1) to Red Hat Enterprise Linux 3 (RHEL3) or are deploying some new servers onto RHEL3 and have had several questions. Some of the known features have either changed behavior a little or have changed in naming or implementation. I will try to explain some of the more commonly used features of RHAS2.1 and how to make use of them in RHEL3.
In this technical note, I focus on the use of the Oracle VLM option to create a large database buffercache and on the use of hugetlb.
New Kernel Naming
RHAS2.1 for ia32
2.4.9-e.25Uniprocessor kernel
2.4.9-e.25-smpSMP kernel capable of handling up to 4GB of physical memory
2.4.9-e.25enterprise-SMP kernel capable of handling up to about 16GB of physical memory
The userspace has access to about 3GB of the userspace segment; the kernel part lives in the other 1GB (4GB address space on 32-bit systems).
The default SGA can be up to 1.7GB (shared pool and buffercache). It is possible to create a larger SGA of up to 2.7GB, by using MAPPED_BASE and relinking the Oracle executable with a lower attach address.
RHEL3 for ia32
2.4.21-4.ELUniprocessor kernel
2.4.21-4.ELsmpSMP kernel capable of handling up to 16GB of physical memory
2.4.21-4.ELhugememSMP kernel capable of handling beyond 16GB, up to 64GB
The other difference with the hugemem kernel is that the kernel and userspace address spaces are split 4GB/4GB, meaning that with the hugemem kernel, a userspace program has access to its 4GB.
With the smp kernel, the default SGA size is the same as in RHAS2.1. However, using the hugemem kernel allows you to create an SGA of up to 3.6GB without having to use the VLM option.
bigpages vs. hugetlb
A typical big server deployment in RHAS2.1 would use bigpages as a bootup parameter to preallocate a large chunk of memory to be used solely for shared memory. These pages have a 2MB or 4MB TLB entry that reduces the number of TLB misses and hence increases performance by a few percent.
The other advantage of using bigpages in RHAS2.1 was that it allowed the kernel VM not to worry too much about bookkeeping for that part of virtual memory. And these pages are not pageable or swappable, so one can guarantee that the Oracle SGA remains in main physical memory.
Enterprise Linux 3 has replaced bigpages with a feature called hugetlb, a backport of what is also in Linux kernel 2.6. There are a few differences in how hugetlb works. Hugetlb behavior is similar to that of bigpages; the pages are backed by large TLB entries, are not pageable, and are preallocated, which means that once you allocate x megabytes of hugetlb pages, that amount of physical memory can be used only through hugetlbfs or shm allocated with SHM_HUGETLB.
RHEL3 no longer requires a bootup parameter; it is dynamically adjustable. After the system has booted, you can echo a value to /proc/sys/vm/hugetlb_pool or you can put the value you want in /etc/sysctl.conf. The value is in megabytes, and it allocates several 2MB pages. You can see the values in /proc/meminfo:
Hugepages_Total: 500
Hugepages_Free: 500
Hugepagesize: 2048K
Note, however, that the kernel needs to find 2MB contiguous physical pages for allocating the hugetlb pool. It does its best to get as many pages as possible, but if there is a lot of fragmentation due to existing binaries running, the pool allocation will probably fail.
A program that wants to allocate shared memory has to add a flag, SHM_HUGETLB, to the shmget() flags. (Oracle Database 10g will do this by default; for Oracle9i Database, a patch is required.) This approach ensures that the Oracle shared memory segments will be allocated out of this pool.
VLM Option
For RHEL3 to use the VLM option to create a very large buffercache, you have two options:
- Use shmfs much as you would in RHAS2.1: mount a shmfs with a certain size to /dev/shm, and set the correct permissions. Keep in mind that in RHEL3, shmfs allocate memory is pageable.
- Use ramfs: ramfs is similar to shmfs, except that pages are not pageable/swappable. This approach provides the commonly desired effect. Ramfs is created by mount -t ramfs ramfs /dev/shm (unmount /dev/shm first). The only difference here is that the ramfs pages are not backed by big pages.
The parameter use_indirect_data_buffers=true remains the same; the settings on the Oracle side should not have to change.
Other Resources
"Red Hat Enterprise Linux 3: An Early Look"
OTN Linux Technology Center