by Orgad Kimchi with contributions from Nicolas Droux
Published July 2014
Part 1 - Using Datalink Multipathing to Add High Availability to Your Network
Part 2 - Doing More with Datalink Multipathing
This article is Part 2 of a two-part series that describes how to use datalink multipathing (DLMP) aggregations.
In this article, we will explore how to perform the following network management operations in an environment that uses DLMP aggregations:
DLMP aggregation gives a network high availability (HA). To learn how to create a DLMP aggregation and to see the system architecture used in this series, see Part 1.
In the next use case, we are going to combine DLMP aggregation technology with other network technology: link protection.
With the increasing adoption of virtualization in system configurations, Oracle Solaris Zones can be given exclusive access to a physical or virtual link by the host administrator. This configuration improves network performance by allowing the virtual environment's network traffic to be isolated from the other traffic that is received or sent by the host system. At the same time, this configuration can expose the system and the entire network to the risk of harmful packets that a guest environment might generate.
Link protection aims to prevent the damage that can be caused to the network by potentially malicious traffic. The feature offers protection from the following basic threats:
Note: Link protection does not replace the need to deploy a firewall, particularly for configurations with complex filtering requirements.
Link Protection provides flexibility that is needed in a cloud environment, for example, enabling
root account access from inside an Oracle Solaris Zone.
root account access provides flexibility, it also exposes security risks. For example, the
root user can create a spoofing attack by sending outgoing packets with a different source IP or MAC address and packets which aren't of type IPv4, IPv6, or ARP. We can use the link protection in order to prevent such attacks.
Link protection is enabled through four different properties:
restricted. The following table describes these properties.
|Property Name||Property Description|
| ||This property requires that any outgoing IP, ARP, or NDP packet must have an address field that matches either a DHCP-configured IP address or one of the addresses listed in the |
| ||This property prevents the |
| ||This property prevents client ID/DUID spoofing for DHCP.|
| ||This property only allows IPv4, IPv6, and ARP protocols. Using this property prevents a link from generating potentially harmful L2 control frames.|
The following example demonstrates how to enable link protection.
From the global zone, enable link protection on
vnic1 by running the following command:
root@global_zone:~# dladm set-linkprop -p protection=mac-nospoof,restricted,ip-nospoof vnic1
Next, we will set up a restriction on the IP address of the zone; thus, even the zone's
root user can't change the IP address.
Specify IP address
10.0.0.1 as the value for the
allowed-ips property for the
root@global_zone:~# dladm set-linkprop -p allowed-ips=10.0.0.1 vnic1
Verify the link protection property values, as shown in Listing 1:
root@global_zone:~# dladm show-linkprop -p protection,allowed-ips vnic1 LINK PROPERTY PERM VALUE DEFAULT POSSIBLE vnic1 protection rw mac-nospoof, -- mac-nospoof, restricted, restricted, ip-nospoof ip-nospoof, dhcp-nospoof vnic1 allowed-ips rw 10.0.0.1 -- --
In Listing 1, we can see that
10.0.0.1 is set as the allowed IP address.
(Optional) When a zone's VNIC is configured using
anet, as was demonstrated in the "Create the First Zone (zone1)" section of Part 1, its link protection is configured with the
link-protection anet property, which is set to
mac-nospoof by default. Using
anet also allows you to specify the IP address in the zone's configuration (through the
allowed-address property) and then specify that that address be configured automatically at zone boot time (by setting the
configure-allowed-address property). This is another advantage of using
zonecfg. Listing 2 shows an example of setting up a zone using
root@global_zone:~# zonecfg -z zone1 Use 'create' to begin configuring a new zone. zonecfg:zone1> create create: Using system default template 'SYSdefault' zonecfg:zone1> set zonepath=/zones/zone1 zonecfg:zone1> set autoboot=true zonecfg:zone1> select anet linkname=net0 zonecfg:zone1:net> set lower-link=aggr0 zonecfg:zone1:net> set allowed-address=10.0.0.1/24 zonecfg:zone1:net> set configure-allowed-address=true zonecfg:zone1:net> end zonecfg:zone1> verify zonecfg:zone1> commit zonecfg:zone1> exit
As we see in Listing 2, we can set protection property values on the fly without the need to reboot the zone; this is useful in production environments, which usually have stringent uptime requirements. In addition, this functionality provides the flexibility that is needed in order to change the security level of the environment. Another benefit is that it encapsulates the IP networking configuration in the zone configuration, which makes migrating a zone between physical machines much easier.
Let's test the link protection by trying to change the zone's IP address.
First, log in to the zone:
root@global_zone:~# zlogin zone1
Now, let's remove the IP interface:
root@zone1:~# ipadm delete-ip vnic1
Next, create a new IP interface:
root@zone1:~# ipadm create-ip vnic1
Now try to assign a new IP address (
10.0.0.10), as shown in Listing 3:
root@zone1:~# ipadm create-addr -a local=10.0.0.10/24 vnic1/v4 ipadm: cannot create address: Permission denied
As we can see in Listing 3, the
root user can't change the zone's IP address!
Let's try to change the IP address to the allowed IP address (
root@zone1:~# ipadm create-addr -a local=10.0.0.1/24 vnic1/v4
Verify the IP address change, as shown in Listing 4:
root@zone1:~# ipadm show-addr vnic1 ADDROBJ TYPE STATE ADDR vnic1/v4 inherited ok 10.0.0.1/24
We can see in Listing 4 that we were able to change the IP address.
We can disable the link protection from the global zone:
root@global_zone:~# dladm reset-linkprop -p protection,allowed-ips vnic1
Note: We don't need to reboot the zone in order to disable this property.
Verify the removal of link protection, as shown in Listing 5:
root@global_zone:~# dladm show-linkprop -p protection,allowed-ips vnic1 vnic1 LINK PROPERTY PERM VALUE DEFAULT POSSIBLE vnic1 protection rw -- -- mac-nospoof, restricted, ip-nospoof, dhcp-nospoof vnic1 allowed-ips rw -- -- --
As we can see in Listing 5, we don't have a restriction on the
In every cloud infrastructure, resource management becomes prerequisite in order to manage the environment's performance and meet guaranteed network service-level agreements (SLAs).
The following examples will show how we can enable network resource management.
Oracle Solaris 11 comes with built-in SLA features allowing VNICs and traffic flows to be configured with a bandwidth limit and/or priority.
When we combine DLMP aggregation and bandwidth limits and flows, we can benefit from network high availability (HA) in addition to advanced network resource management.
Bandwidth controls can be configured on a VNIC from the global zone to limit the network traffic sent from and received by a zone through its VNIC.
We can limit the throughput of a VNIC using the
maxbw property of the VNIC, as shown in Figure 1. Our interface for doing this is the already-familiar
Figure 1. Setting bandwidth limits
For example, the following command sets the maximum throughput of
vnic1 to 500 Mb/sec, effectively giving it a portion of the physical data link (network port
root@global_zone:~# dladm set-linkprop -p maxbw=500M vnic1
Verify the change, as shown in Listing 6:
root@global_zone:~# dladm show-linkprop -p maxbw vnic1 LINK PROPERTY PERM VALUE DEFAULT POSSIBLE vnic1 maxbw rw 500 -- --
In Listing 6, we can see that the VNIC maximum throughput is set to 500 Mb/sec.
(Optional) A similar
maxbw property can be set on
anet using the
root@global_zone:~# zonecfg -z zone1 zonecfg:zone1> select anet linkname=net0 zonecfg:zone1:anet> set maxbw=500M zonecfg:zone1:anet> end zonecfg:zone1> verify zonecfg:zone1> commit zonecfg:zone1> exit
maxbw configured on the VNIC, edge virtual bridging (EVB) can be used to push the enforcement of that bandwidth into the adjacent physical network switch. For more information about EVB, see "Edge Virtual Bridging in Oracle Solaris."
If you need fine-grained network bandwidth control, Oracle Solaris 11 provides the capability to set up a flow, which is a sophisticated quality of service (QoS) mechanism. Flows allow us to limit the network bandwidth for a specific network port on specific network interface.
Note: You can combine bandwidth control on a data link and its flows simultaneously.
We can implement flows on top of the DLMP aggregations that we already created (
aggr0). This is useful in order to mitigate Denial of Service (DOS) attacks against a web server. We can minimize the attack impact on the rest of the infrastructure by limiting a zone's HTTPS network maximum bandwidth to 100 Mb/sec.
In the following example, we limit the Secure Sockets Layer (SSL) traffic to 100 Mbits on the
vnic1 network interface, as shown in Figure 2.
Figure 2. Setting up a flow for bandwidth control
First, let's create the flow from
zone1 using the
root@zone1:~# flowadm add-flow -l vnic1 -a transport=TCP,local_port=443 https-flow
Set the flow's bandwidth limitation to 100 Mb/sec.
root@zone1:~# flowadm set-flowprop -p maxbw=100M https-flow
Verify the flow creation:
root@zone1:~# flowadm show-flow FLOW LINK PROTO LADDR LPORT RADDR RPORT DSFLD https-flow vnic1 tcp -- 443 -- -- --
Then validate the bandwidth limit, as shown in Listing 7:
root@zone1:~# flowadm show-flowprop https-flow FLOW PROPERTY PERM VALUE DEFAULT POSSIBLE https-flow maxbw rw 100 -- -- https-flow priority rw medium medium low,medium,high https-flow hwflow r- off -- on,off
In Listing 7, we can see that the flow's bandwidth limit is 100 Mbits.
Note that in the example above, the flow was created on the VNIC from the non-global zone, so the flow is part of the zone configuration and will be migrated along with the zone. Such flows are reinstantiated automatically when a zone boots.
Another useful feature of the flow technology is the ability to monitor our flow's network-performance statistics using the
flowstat command. This feature is useful for troubleshooting network performance issues.
Let's rerun the
iperf network-performance tool we used in "Testing the HA Capability" section of Part 1, but this time the server will listen on port 443 (the same port on which the flow has been set up).
First, start the
iperf server on
Note: You might need to stop the previous
iperf server; you can stop it by pressing Ctrl-C.
root@zone1:~# iperf -s -l 128k -p 443 ------------------------------------------------------------ Server listening on TCP port 443 TCP window size: 125 KByte (default) ------------------------------------------------------------
zone3, start the
iperf client (the loader):
root@zone3:~# iperf -c zone1 -l 128k -P 4 -i 1 -p 443 -t 360 ------------------------------------------------------------ Client connecting to zone1, TCP port 443 TCP window size: 48.0 KByte (default) ------------------------------------------------------------ [ 7] local 10.0.0.3 port 53777 connected with 10.0.0.1 port 443 [ 4] local 10.0.0.3 port 58403 connected with 10.0.0.1 port 443 [ 6] local 10.0.0.3 port 59851 connected with 10.0.0.1 port 443 [ 5] local 10.0.0.3 port 42292 connected with 10.0.0.1 port 443 [ ID] Interval Transfer Bandwidth [ 7] 0.0- 1.0 sec 5.88 MBytes 49.3 Mbits/sec [ ID] Interval Transfer Bandwidth [ 6] 0.0- 1.0 sec 7.12 MBytes 59.8 Mbits/sec ...
Open another terminal and then, from the global zone, start the network flow observation, as shown in Listing 8:
root@global_zone:~# flowstat -i 1 FLOW IPKTS RBYTES IDROPS OPKTS OBYTES ODROPS zone1/https-flow 5.18M 7.51G 240.06K 2.63M 175.30M 0 zone1/https-flow 9.43K 13.62M 416 1.30K 87.00K 0 zone1/https-flow 9.46K 13.35M 300 2.93K 195.31K 0 zone1/https-flow 10.01K 13.65M 460 4.89K 325.77K 0 zone1/https-flow 8.75K 12.00M 425 4.50K 300.27K 0 zone1/https-flow 9.70K 13.50M 459 4.99K 332.64K 0 zone1/https-flow 8.03K 11.13M 371 4.17K 278.80K 0 zone1/https-flow 10.17K 14.12M 485 5.21K 347.46K 0 zone1/https-flow 9.13K 12.50M 416 4.66K 310.23K 00 ^C
Note: To stop the
flowstat command, press Ctrl-C.
In Listing 8, you can see the network statistics for the flow that we created earlier (
zone1/https-flow). In addition, you can see that the bandwidth is limited to approximately 13 MB/sec (100 Mb/sec).
The flow capability allows us to change the bandwidth limit dynamically, for example, if we want to increase the bandwidth in order to fulfill the demand for this web server. This capability is very important in a cloud environment in order to provide an elastic network infrastructure.
Let's change the bandwidth limit to 200 Mb/sec:
root@zone1:~# flowadm set-flowprop -p maxbw=200M https-flow
Verify the bandwidth change, as shown in Listing 9:
root@zone1:~# flowadm show-flowprop https-flow FLOW PROPERTY PERM VALUE DEFAULT POSSIBLE https-flow maxbw rw 200 -- -- https-flow priority rw medium medium low,medium,high https-flow hwflow r- off -- on,off
In Listing 9, we can see that it's 200 Mb/sec now.
Let's return to the terminal where the
flowstat command is running, as shown in Listing 10.
root@global_zone:~# flowstat -i 1 FLOW IPKTS RBYTES IDROPS OPKTS OBYTES ODROPS zone1/https-flow 25.14K 33.00M 550 12.67K 839.20K 0 zone1/https-flow 18.29K 24.88M 580 9.23K 611.16K 0 zone1/https-flow 21.39K 28.62M 647 10.86K 721.25K 0 zone1/https-flow 18.22K 24.75M 531 9.24K 613.42K 0 zone1/https-flow 19.14K 25.75M 654 9.72K 646.39K 0 zone1/https-flow 17.55K 23.42M 504 8.88K 588.76K 0 zone1/https-flow 20.44K 27.76M 656 10.33K 684.23K 0 zone1/https-flow 18.64K 25.07M 567 9.51K 632.57K 0 ^C
Note: To stop the
flowstat command, press Ctrl-C.
In Listing 10, we can see that the bandwidth has been changed to approximately 25 MB/sec, which equals 200 Mb/sec.
We can monitor from the global zone both the zones and their associated flows!
Once you finish your network measurements, you can remove the flow:
Note: You don't need to reboot the zone in order to remove the flow; this is very useful in production environments when you need to be able to troubleshoot network performance issues without severely impacting the environment.
zone1, list the current flows:
root@zone1:~# flowadm show-flow FLOW LINK PROTO LADDR LPORT RADDR RPORT DSFLD https-flow vnic1 tcp -- 443 -- -- --
root@zone1:~# flowadm remove-flow https-flow
Verify the flow removal, as shown in Listing 11:
root@zone1:~# flowadm show-flow
If the flow has been removed, the command will return to the prompt without any output. In Listing 11, we can see that we no longer have the
flowadm add-flowcommand and remove them using the
flowstat; in addition, you can monitor all the flows from the global zone.
A new feature introduced in Oracle Solaris 11 supports setting up an NFS server inside a non-global zone. Using this capability, you can build an NFS server for each tenant in a cloud environment in order to provide secure, shared storage between the zones, as shown in Figure 3.
Figure 3. Providing secure, shared storage between zones
Figure 4 shows how the DLMP aggregation provides redundancy to other network services, such as an NFS server.
Figure 4. Improving the availability of an NFS server with DLMP
Using the command shown in Listing 12, we will create a ZFS file system and share it using NFS:
root@zone1# zfs create -o encryption=on -o dedup=on -o compression=on -o mountpoint=/data -o sharenfs=on rpool/data Enter passphrase for 'rpool/data': Enter again:
Note: You need to provide the passphrase; it must be at least eight characters.
The command in Listing 12 used the following options:
-o encryption=onenables encryption.
-o dedup=onenables deduplication.
-o compression=onenables compression.
-o mountpoint=/dataspecifies the location of the mount point.
-o sharenfs=onspecifies that the ZFS file system should be shared via NFS.
In Listing 12, we can see that by using a single command, we can create a ZFS file system with encryption, compression, and deduplication and share it using NFS.
Let's verify that the NFS share was created, as shown in Listing 13:
root@zone1# share rpool_data /data nfs sec=sys,rw
In Listing 13, we can see that the NFS share has been created.
ZFS can share file systems such as NFS and SMB (Server Message Block). For more examples of using ZFS to share file systems, see
Now, let's mount this file system from the client (
First, verify that we can access the NFS share by listing the NFS shares on
root@zone3# showmount -e zone1 export list for zone1: /data (everyone)
Now, mount the NFS file system, as shown in Listing 14:
root@zone3# mount zone1:/data /mnt
After running the command shown in Listing 14, the NFS server will benefit from the network HA that DLMP provides without the need to set up anything on the NFS server.
As we saw in this use example, the Oracle Solaris11 operating system provides virtualization technologies for every layer, such as storage virtualization using ZFS and network virtualization and operating system virtualization using Oracle Solaris Zones.
showmount -ecommand to list the shared file systems on a remote NFS server.
In DLMP aggregation, failure detection is a method for detecting the failure of the aggregated ports. A port is considered to have failed when it cannot send or receive traffic. A port might fail because of the following reasons:
DLMP aggregation performs failure detection on the aggregated ports to ensure the continuous ability of the network to send and receive traffic. When a port fails, the clients associated with that port—for example, a configured IP stack or VNICs—are failed over to one of the remaining active ports. Failed aggregated ports remain unusable until they are repaired. The remaining active ports continue to function while any existing ports are deployed as needed. After a failed port recovers from a failure, clients from the other active ports can be associated with it.
DLMP aggregation supports both link-based and probe-based failure detection.
Probe-based failure detection is performed by using the combination of two types of probes—Internet Control Message Protocol (ICMP) probes (which are Layer 3 probes) and transitive (Layer 2) probes—which work together to determine the health of the aggregated physical data links.
ICMP probes are used to check the health of a port by sending probe packets to probe targets. The IP address of the probe target and local IP address to be used for sending these packets are automatically configured, when possible. One or more probe targets and source addresses can also be configured explicitly, if needed. Note that the target IP address must be on the same subnet as the specified source IP address. Note also that local IP addresses configured for DLMP probing can continue to carry regular traffic. Therefore, you do not need to reserve IP addresses for exclusive use by DLMP.
Transitive probing is performed between the aggregated ports using the exchange of layer-2 transitive probes. These probes allow the aggregation to determine the health of all aggregated ports while avoiding the configuration of a separate local IP probe address for each one of them.
Note: Probe-based failure detection is performed in the global zone when VNICs over an aggregation are created in the global zone and are assigned to non-global zones. The probing is, therefore, centralized and does not need to be configured in the zones themselves—unlike with IPMP, which needs to be configured in each zone.
In the following example, we will set up ICMP probing for aggregation
First, set the probe targets for
root@global_zone:~# dladm set-linkprop -p probe-ip=+ aggr0
Note: Since the source IP address is not specified, any of the IP addresses configured on the aggregation
aggr0 and its VNICs will be used as the source IP addresses of ICMP probes. Note that the IP addresses of VNICs that are configured in a non-global zone will not be used for probing by the global zone.
Set the failure detection time to 5 seconds:
root@global_zone:~# dladm set-linkprop -p probe-fdt=5 aggr0
Note: The default failure detection time is 10 seconds.
Display the properties that we have set:
root@global_zone:~# dladm show-linkprop -p probe-ip,probe-fdt aggr0 LINK PROPERTY PERM VALUE EFFECTIVE DEFAULT POSSIBLE aggr0 probe-ip rw + + -- -- aggr0 probe-fdt rw 5 5 10 1-600
The example in Listing 15 displays statistics for the probes for our DLMP aggregation. Using the probe type
-P option, you can provide a comma-separated list of arguments (
t for transitive probe,
i for ICMP probe, and
all for both ICMP and transitive probes).
root@global_zone:~# dlstat show-aggr -n -P t,i aggr0 TIME AGGR PORT LOCAL TARGET PROBE NETRTT RTT 0.06s aggr0 net3 net3 net2 t528 -- -- 0.06s aggr0 net3 net3 net2 t528 13.59ms 14.55ms 0.17s aggr0 net1 net1 net0 t528 -- -- 0.17s aggr0 net1 net1 net0 t528 8.99ms 9.78ms 0.22s aggr0 net2 net2 net1 t528 -- -- 0.22s aggr0 net2 net2 net1 t528 8.44ms 9.36ms 0.27s aggr0 net2 net2 net0 t528 -- -- 0.27s aggr0 net2 net2 net0 t528 8.31ms 9.14ms ...
The following items are shown in the output of Listing 15:
TIME: Time at which the probe was sent in seconds. This time is relative to the time when you issue the
dlstatcommand. If the probe is sent before you issue the
dlstatcommand, the time is negative.
AGGR: The name of the aggregation for which the probe was sent.
PORT: The name of the port for which the probe was sent.
LOCAL: For ICMP probes, the source IP address of the probes. For transitive probes, the port name from which the transitive probe originated.
TARGET: For ICMP probes, the destination IP address of the probes. For transitive probes, the port name of the probe that is targeted.
PROBE: Identifier number representing the probe. The prefix
tis for transitive probes and the prefix
iis for ICMP probes.
NETRTT: Network round-trip-time for the probe. This value is the time period between sending the probe and receiving the acknowledgment from the DLMP aggregation.
RTT: Total round-trip-time for the probe. This value is the time period between sending the probe and completing the process of acknowledgment by the DLMP aggregation.
For more examples of setting up probe-based failure detection, see "Configuring High Availability by Using Link Aggregations."
In this article, we explored how we can leverage built-in Oracle Solaris 11 technologies to implement and monitor network high availability for a multitenant cloud infrastructure. We have seen examples of how we can combine datalink multipathing with other networking technologies, such as link protection, to deliver increased benefits. Specifically, we coupled the use of network security with QoS achieved by setting limits on bandwidth consumption via bandwidth control, and we used traffic flows for network resource management.
Also see these additional publications by this author:
And here are additional Oracle Solaris 11 resources:
Orgad Kimchi is a principal software engineer on the ISV Engineering team at Oracle (formerly Sun Microsystems). For 6 years he has specialized in virtualization of big data and cloud computing technologies.
Nicolas Droux is the chief architect for Solaris Kernel Networking at Oracle. His specialties have developed from his more than 20 years' experience working on operating systems kernel, networking, virtualization, security, I/O, performance, HPC, and cloud architectures.
|Revision 1.0, 07/09/2014|