Horizontal Scaling on a Vertical System Using Zones in the Solaris 10 OS

By Ashutosh Kumar, Nagendra Nagarajayya, Dmitry Isakbayev, June 2006  
Abstract: This case study shows how Solaris Zones technology was used to solve scalability and performance problems relating to a single-threaded web server.
  1. Introduction
  2. The Problem
  3. The Solution
  4. How Solaris Zones Help in Scaling Horizontally
  5. Improvement in Performance
  6. Conclusion
  7. Acknowledgments
  8. References

1. Introduction

Multithreading applications are a way to scale and meet today's growing business requirements while reducing the number of systems needed. Multithreaded applications scale well on the new chip multithreading (CMT) systems like the Sun Fire T1000/T2000 and V490 servers. Carriers and service providers can consolidate the back-end systems using CMT and multithreading. However, some applications are single-threaded, which will always provide an obstacle to performance and scalability. The Solaris 10 OS offers Solaris Zones, a virtualization environment that can be used to overcome this problem and scale horizontally on a single vertical system. This paper presents a case study of how Solaris Zones technology was used to overcome a scalability bottleneck related to a single-threaded web server.
Market Development Engineering (MDE) at Sun Microsystems works with independent software vendors (ISVs) on making ISV applications run the best on Sun. Tunathon is a yearly program run by MDE to bring together ISVs and the various product groups at Sun. The goal of the tunathon is to improve the ISVs' application performance. The ISVs have direct access to the product groups to resolve any performance-related issues in order to make their applications run faster on Sun. Sun provides the necessary hardware and software, along with engineering. TransNexus, an ISV, develops operations and billing support software (OSS/BSS) for Voice over IP (VoIP) and network management. TransNexus participated in Tunathon2005 because it was facing difficulties scaling the performance of a single-threaded web server.

1.1 TransNexus Applications

Open Settlements Protocol (OSP) is an international standard for VoIP carriers that provides a secure mechanism for IP communication. An OSP server authorizes call setup between peer VoIP gateways. The source gateway (the originating gateway in a call setup) sends an authorization request message to the OSP server to obtain the IP address of a destination gateway that can complete the call to the dialed number. The OSP server sends an authorization response message back to the source gateway. The authorization response message contains the IP address of the destination gateway that can complete the call to the dialed number and also a digitally signed token to be used by the source gateway in a call setup. The source gateway uses the digitally signed token to connect to the destination gateway; the destination gateway verifies the token to make sure that it's coming from a trusted source.
The major steps involved in a call setup are shown in the following figure.
Figure 1: Call Setup
Figure 1: Call Setup
When the call is over, the source gateway and destination gateway both send a UsageIndication message to the OSP server. This message is confirmed by a UsageConfirmation message from the OSP server to the source gateway and destination gateway, as shown in the following figure.
Figure 2: Gateway and Server Communication
Figure 2: Gateway and Server Communication
Most of the TransNexus applications follow the above-mentioned OSP protocol for IP communication.

1.2 NexSRS Server

NexSRS server is an OSP server from TransNexus that runs on the Solaris 10 OS. Clients communicate with NexSRS using HTTP. NexSRS uses an external web server ( Xitami) to process the HTTP requests. The external web server passes on the client request to NexSRS for processing. HTTP connections have configurable persistence, so the same HTTP connection can be used for multiple transactions.

2. The Problem

The external web server used by NexSRS is single-threaded, and it becomes a bottleneck for the scalability of NexSRS server and other NexSRS components. The external web server reaches 100% CPU utilization and cannot scale beyond, limiting NexSRS performance. NexSRS itself is multithreaded but is limited by the number of HTTP requests coming in from the external web server. This limits NexSRS and the external web server to two- or four-CPU systems.

3. The Solution

The idea is to scale horizontally on a vertical system through virtualization. With Solaris Zones, you can create virtualized application execution environments within a single instance of the operating system. The virtualization environment allows multiple workloads to be run in isolation (see Solaris Zones: Operating System Support for Consolidating Commercial Workloads (pdf)). Applications running in one environment do not affect or see the data of another application running in another environment. The virtualized environment does not allow one application to hog system resources, and it also provides facilities to monitor, secure, configure, and administer at the application level as well as at the virtualization environment level.
In this scenario, multiple zones can be created on a 2-, 4-, 8- or 12-CPU system. An instance of the external web server and NexSRS can be deployed in each zone. Each zone now behaves as a separate system, allowing horizontal scaling on a single system. Each system can handle independent workloads or be load balanced to handle a single workload.

3.1 Introduction to Solaris Zones

Zones in the Solaris 10 OS provide multiple virtual operating system environments (zones) sharing the same kernel instance: A single physical server is divided into multiple virtual servers, each with its own operating system environment. The two kinds of zones are global and non-global. The global zone is the Solaris OS environment that is bootable by the system hardware; non-global zones are created and managed by the global zone administrator. The global zone is a fully functional Solaris environment and is comparable to a normal Solaris instance. By default, the global zone is always running on a system even when no other zone is configured. Each non-global zone has its own file systems, networking, security, and operating system resources.
The zonecfg, zoneadm, and zlogin commands are used by the global zone administrator to create, install, configure, and boot non-global zones. Theoretically, 8192 zones can be created on any system; however, in practice, the number depends on the total available resources and their use by various application modules (in order to optimize the use of resources).
A non-global zone can be in the following states:
  • Configured
  • Incomplete
  • Installed
  • Ready
  • Running
  • Shutting Down
  • Down
A non-global zone transitions through the following states during a typical bring-up process: Configured --> Installed --> Ready --> Running.
These states are shown in the following figure.
Figure 3: Typical States
Figure 3: Typical States
3.1.1 Planning for Zone Creation
We need to know the virtual IP addresses of the non-global zones to be created; this virtual IP address is configured on the physical network interface of the system.
Every non-global zone has a unique name. global and any string starting with SUNW are reserved, so these can't be specified as a zone name. Two types of non-global zone file systems can be created, sparse and whole root. The sparse root zone model optimizes sharing of resources while the whole root zone provides maximum configurability. Non-global zones that have inherit-pkg-dir resources are called sparse root zones. Every non-global zone has its own root directory; the path to that directory is relative to the root directory of the global zone and is configured by setting the zonepath parameter while configuring the non-global zone. The root directory for every non-global zone must be created by the global zone administrator with privilege 700 to prevent other users running in the global zone from accessing the non-global zones.
The following information is needed at the time of zone creation:
  • zonepath: the path to the root of a zone
  • autoboot: to auto boot the zone when system boots
  • pool: the resource pool that the zone is bound to
  • fs: file system
  • inherit-pkg-dir: directory "inherited" from the global zone
  • net: network interface
  • device: device
  • rctl: resource control
  • attr: generic attribute
The only mandatory parameter for zone creation is zonepath. All other parameters are optional.
3.1.2 Zone Creation and Configuration
The global zone administrator uses the zonecfg -z <zone-name> command to create and configure a non-global zone.
-bash-3.00# zonecfg -z SRS1
SRS1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:SRS1> create
zonecfg:SRS1> set zonepath=/zone1
zonecfg:SRS1> add net
zonecfg:SRS1:net> set address=
zonecfg:SRS1:net> set physical=eri0
zonecfg:SRS1:net> end
zonecfg:SRS1> add inherit-pkg-dir
zonecfg:SRS1:inherit-pkg-dir>set dir=/export/home/transnexus
zonecfg:SRS1:inherit-pkg-dir> end
If we use the info command at this time, we can see whatever configuration has been done so far.
zonecfg:SRS1> info
zonepath: /zone1
autoboot: false
        dir: /lib
        dir: /platform
        dir: /sbin
        dir: /usr
        dir: /export/home/transnexus
        physical: eri0
zonecfg:SRS1> verify
zonecfg:SRS1> commit
zonecfg:SRS1> exit
3.1.3 Zone Installation
The zoneadm command is used to verify and install the zone, SRS1, at this point.
-bash-3.00# zoneadm -z SRS1 verify
WARNING: /zone1 does not exist, so it cannot be verified.
When 'zoneadm install' is run, 'install' will try to create
/zone1, and 'verify' will be tried again,
but the 'verify' may fail if:
the parent directory of /zone1 is group- or other-writable
/zone1 overlaps with any other installed zones.

-bash-3.00# zoneadm -z SRS1 install
Preparing to install zone <SRS1>.
Creating list of files to copy from the global zone.
Copying <2574> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <906> packages on the zone.
Initialized <906> packages on zone.
Zone <SRS1> is initialized.
The file </zone1/root/var/sadm/system/logs/install_log> 
contains a log of the zone installation.

-bash-3.00# zoneadm list -vc
  ID NAME             STATUS         PATH
   0 global           running        /
   - SRS1             installed      /zone1
At this point, a boot environment (BE) is created at the /zone1 location; zone SRS1 is now ready to be booted.
3.1.4 Booting the Zone
Before booting the first time, zlogin -C zonename can be used to connect and establish a console session with the zone. This will prompt for information such as host name, time zone (TZ), and name service to complete the system identification. The global zone administrator can then boot the non-global zone using the zoneadm -z <zone-name> boot command from another terminal window.
Following is a snapshot:
-bash-3.00# zoneadm list -vc
  ID NAME             STATUS         PATH
   0 global           running        /
   1 SRS1             installed      /zone1

-bash-3.00# zoneadm -z SRS1 boot

-bash-3.00# zoneadm list -vc
  ID NAME             STATUS         PATH
   0 global           running        /
   1 SRS1             running        /zone1
After issuing the zoneadm -z SRS1 boot command, we can see that the status of the zone SRS1 has changed from installed to running. Zone SRS1 has the ID 1. It's just like another system running, which has its own IP address and the Solaris OS environment. We can ping this virtual system:
bash-3.00$ ping is alive
The zonename command can be used to obtain the zone name.
bash-3.00# zonename
Note: The system identification parameters can also be set in the /<zonepath>/root/etc/sysidcfg file before the first boot, avoiding the need to enter system identification parameters during the first boot.

4. How Solaris Zones Help in Scaling Horizontally

NexSRS does not scale beyond two or more CPUs as it is limited by the external web server it uses to receive HTTP requests. An increase in load leads to the external web server reaching 100% CPU usage, limiting performance. Since the external web server is single-threaded, this is the maximum load that can be handled by NexSRS.
Solaris Zones technology allows a single instance of the Solaris OS to be virtualized into multiple application execution environments. NexSRS can be installed in these multiple virtualized environments, allowing multiple instances of the external web server to be run. Each instance of the external web server can now process the incoming requests and pass them on to NexSRS for further processing.

4.1 What We Did

We created a non-global zone and installed a copy of the external web server (Xitami) and NexSRS in the zone. An instance of Xitami and of NexSRS were run in the global zone and the new zone. We used two load generators to load the two instances of NexSRS. The load generators saw the two zones as separate systems. Each instance of NexSRS had its own configuration and processed the HTTP requests from the Xitami instance running in its zone. Note: The main system used was the Sun Fire V480 server, as shown in Figure 4.
Zones allowed two instances of Xitami to run on the same system, allowing NexSRS to received more HTTP requests and scale horizontally, overcoming the 100% CPU bottleneck.
Figure 4: Without Zones
Figure 4: Without Zones
4.1.1 Steps for Global Zones
We followed these steps:
  1. Install NexSRS by untarring the distribution tar file and then configure NexSRS.
  2. Start NexSRS by running start_osp_server.sh as before.
  3. Create a processor set with one CPU and bind Xitami to it.
4.1.2 Steps for Non-Global Zones
We followed these steps:
  1. Create a non-global zone as described in Section 3.1.
  2. Install NexSRS by untarring the distribution tar file, and then configure NexSRS as in the global zone.
  3. Start NexSRS by running start_osp_server.sh as in Section 4.1.1.

4.2 How We Generated the Load

The load was generated from a Sun Fire V880 server with eight CPUs. We used the ApacheBench (ab) tool to load NexSRS. We started three instances of ApacheBench in three different shells, each sending a message to load NexSRS in the global zone -- see "Load gen (1)" in the following figure. Similarly, we used another three instances of ApacheBench in three different shells to load NexSRS in the non-global zone -- see "Load gen (2)" in Figure 5.
Figure 5: With Zones  
Figure 5: With Zones
4.2.1 Loading NexSRS in a Global Zone
ApacheBench was used to load NexSRS in the global zone; see "Load gen (1)" in Figure 5. Note: The host name is eagle:
./ab.sol8 -p auth.xml -n 1000000 -c 10 -k http://eagle:1080/osp &
./ab.sol8 -p src.xml  -n 1000000 -c 10 -k http://eagle:1080/osp &
./ab.sol8 -p dest.xml -n 3000000 -c 30 -k http://eagle:1080/osp &
4.2.2 Loading NexSRS in a Non-Global Zone
We used ApacheBench again to load NexSRS in the non-global zone -- see "Load gen (2)" in Figure 5. Note: The host name is moe. The load was executed in parallel with the above load:
./ab.sol8 -p auth.xml -n 1000000 -c 10 -k http://moe:1080/osp &
./ab.sol8 -p src.xml  -n 1000000 -c 10 -k http://moe:1080/osp &
./ab.sol8 -p dest.xml -n 3000000 -c 30 -k http://moe:1080/osp &

4.3 How We Measured the CPS

Calls per second (CPS) was measured by tailing the nexus.log file -- the log files show calls per minute (CPM), which needs to be converted. ApacheBench also outputs CPS at the end of the test, and this was compared with the log file to ensure that tests ran successfully.
4.3.1 System Performance
CPU Usage on Solaris 9 OS

Calls per Minute

CPU Utilization (%)






Xitami was bound to a 1-CPU processor set.
CPU Usage on Solaris 10 OS (No Zones)

Calls per Minute

CPU Utilization (%)






Xitami was bound to a 1-CPU processor set, and xitami.environment was set to 0.
Global Zone CPU Usage on Solaris 10 OS (With Zones)

Calls per Minute

CPU Utilization (%)






Xitami was bound to a 1-CPU processor set, and xitami.environment was set to 0.
Non-Global Zone CPU Usage on Solaris 10 OS (With Zones)

Calls per Minute

CPU Utilization (%)






xitami.environment was set to 0.
Xitami was not bound to a processor set, since there were not enough idle resources. With sufficient idle resources, Xitami could be bound to a 1-CPU processor set, further improving performance.

5. Improvement in Performance

Without zones we were getting 6227 CPM on the Solaris 9 OS and 8235 CPM on the Solaris 10 OS, and with zones we were able to reach 12955 CPM (7735 global zone CPM + 5220 non-global zone CPM). This was a 108% performance improvement on the same system.

5.1 Reason for Improvement

Solaris Zones allowed multiple instances of a single-threaded web server to run on a single system. Without the zones, only one instance of the web server could be run, which limited performance and scaling on a system. With an increase in the load, the single-threaded web server used to become a bottleneck using 100% of a CPU. With the zones, multiple instances of the web server can be run in their own virtualized environments, allowing the system to scale beyond the performance of a single instance.

6. Conclusion

Solaris Zones technology offers horizontal scaling on a vertical system, allowing multiple virtualized environments to run in isolation. Multiple single systems can be consolidated into a single system, saving operational and capital expenditure (opex/capex), while improving the performance on a single system. In this tunathon, we used two instances of the external web server, Xitami, and and two instances of NexSRS to improve performance. The two NexSRS instances made the system appear as two separate systems. The separate NexSRS instances could be replaced with a single NexSRS instance running in the global or non-global zone, and the two Xitami instances could communicate with this NexSRS instance. A load balancer could distribute the load to the two Xitami instances.
A global zone and a non-global zone were used for scalability here; two non-global zones could also be used to achieve the same result, and they might be easier to manage. We used a processor set to manage the CPU resources. These could also be managed with the Solaris resource manager, which uses resource pools to allow resource management at the zone level (see References section for more information).

7. Acknowledgments

We would like to thank all the people who helped us with this project. Thanks to Hashamkha Pathan, Prashant Srinivasan, and Jan Van Bruaene for taking the time to review the paper. Their suggestions have increased the readability of the document. We would also like to acknowledge the DSC staff. Finally, we would like to thank William Murray from TransNexus who spent a lot of time helping us get started on the project.

8. References

About the Authors

Ashutosh Kumar has been with Sun since December 2004. He has been working on performance tuning for ISV applications, especially relating to Java technology and JVM performance. He does a lot of volunteer work with destitute women and composes poetry in his free time.
Nagendra Nagarajayya has been working with Sun for the last 12 years. He is a Staff Engineer in Market Development Engineering (MDE), working with ISVs in the telecommunications (telco) industry on issues related to architecture, performance tuning, sizing and scaling, benchmarking, porting, and so on. He specializes in multithreaded issues, concurrency and parallelism, HA, distributed computing, networking, and performance tuning.
Dmitry Isakbayev has worked at TransNexus since 1997 and leads all software development. TransNexus has been an innovator of commercial and open source VoIP Operations and Billing Support Systems (OSS/BSS) since 1997. Key features of the TransNexus OSS/BSS solution include least cost routing, quality of service routing, secure inter-domain VoIP peering, traffic analysis and control, management reports, and more.
Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.