Developing Real-Time Software with Java SE APIs: Part 2

by Kelvin Nilsen

Engineering and analysis techniques that can guide the development mission-critical real-time systems.

Published August 2014

This article is Part 2 of a two-part series describing the use of Java SE to implement real-time systems. In Part 1 of this series, we examined the implementation of a traction monitoring system in Java SE and demonstrated that it did, indeed, satisfy the required real-time constraints. However, we did not discuss the engineering and analysis techniques that guided the development of the presented solution. In this article, we'll look a bit deeper at the theoretical foundation upon which the working code rests.

Note: The full source code for the sample application described in this article is available here.

Analysis of Real-Time Task Scheduling

Traditionally, real-time software systems comprise multiple tasks, each having an expected maximum execution frequency, a maximum execution time, and a specified deadline by which the task must complete its execution each time it is triggered. Considerable academic research has developed a variety of real-time scheduling mechanisms with accompanying mathematical analyses that prove various characteristics of the scheduled workload. The more-sophisticated scheduling techniques are able to guarantee that higher percentages of the real-time workload will satisfy real-time constraints, and they might offer control over which task constraints are honored and which ones are relaxed when the system is experiencing transient work overload.

It is beyond the scope of this article to provide a full survey of available real-time scheduling techniques. However, the analysis of our sample application is based on the popular rate-monotonic analysis (RMA) technique [1].

The general idea of RMA is that for a real-time workload consisting of m periodic tasks for which each task is required to complete its execution before its next period begins, all deadlines will be met as long as task priorities are assigned in the order of task deadlines, so that the tasks with the shortest periods have the highest priorities. For m = 10, the total system utilization must be less than the result shown in Figure 1.

realtime-pt2-f1

Figure 1. Calculation of total system utilization

The overall system utilization is calculated by summing the results of dividing each real-time task's worst-case execution time by its minimum trigger period.

Consider, for example, the real-time workload described in Part 1 of this series. This workload consists of 10 tasks. The execution times for these tasks were measured on a particular platform, and they are shown in Table 1.

Table 1. Real-time periodic workload
Java SE Class Instantiations Period (ms) Execution Time (μs) Utilization
com.atego.srt.Dispatcher 1 4 38.4 0.960%
simulation.ScriptA
implements main(String[] args)
1 100 135 0.136%
com.atego.srt.PeriodicDelay 1 100 4.16 0.00416%
traction.AcceleratorMonitor 1 32 9.31 0.0291%
traction.BrakeMonitor 1 16 9.07 0.0567%
traction.SteeringMonitor 1 24 9.83 0.0410%
traction.WheelSensor[0,1,2,3] 4 8 4.62, 4.42, 6.53, 5.69 0.0577%, 0.0553%, 0.0817%, 0.0711%

Even though the work of each WheelSensor thread is conceptually the same, there are clear differences in their execution times. These differences result presumably from caching effects (in each period, the threads that are scheduled later are more likely to find their instruction memory already present within the instruction cache) and from slight differences in interaction with the I/O simulator (the first sensor object to consult the I/O simulator for each instant in time pays the cost of advancing the simulated data to the current time).

It is difficult to prove the reasons for the different performance without more detailed scrutiny of each thread's behavior, scrutiny that requires techniques beyond the capabilities of the approaches encouraged in this article for soft real-time programming. The sum of the values in the Utilization column is only 1.49 percent. Thus, it is well within the capacity of this computer to ensure compliance with all of the application's real-time constraints.

There are several attractive characteristics of RMA:

  • It is fairly simple to understand and to apply because it requires minimal support from the underlying operating system.
  • It enables easy and reliable mixing of real-time software and non-real-time software. As long as all the non-real-time software is assigned priorities lower than the priorities used in the analysis of the real-time task behavior, the non-real-time software will run only during time spans when the real-time workload is idle.
  • It is stable in the presence of transient work overloads. If the real-time system integrator underestimated worst-case task execution times or overestimated the delays between consecutive triggerings of real-time tasks, the system automatically sheds load in a predictable way. The lower-priority threads, which are the ones that have the longer deadlines, will begin missing their deadlines before the higher priority threads (with the tighter deadlines) miss theirs.

But RMA is not ideal in all situations. One weakness of this approach is that its response to transient workloads always favors the highest-priority (shortest-deadline) tasks. For many real-time workloads, these short-deadline tasks are not the most important tasks to be performed when the system is in overload. Alternative real-time scheduling approaches offer more intelligent behavior during overload.

Another disadvantage of RMA is that it does not make 100 percent of the CPU available to the real-time workload. The maximum real-time workload utilization is bounded, for instance, in our example, to less than 71.77 percent.

The most significant shortcoming of RMA is that it does not directly support modular composition of independently developed real-time software components. This is because each real-time component is required to know the timing constraints on all other real-time components in the system in order to map its deadlines to the limited number of global system priorities. Nevertheless, rate-monotonic analysis represents a good introduction to the topic of real-time scheduling, and there is plenty of literature describing alternative real-time scheduling techniques for those who need additional flexibility.

Measuring Task Execution Times

In a fully rigorous "hard real-time system," it would be important to know the worst-case execution time of each task. However, it is neither practical to measure worst-case execution times nor to enforce hard real-time scheduling constraints in a Java SE environment.

For our purposes, we want an average or expected execution time for each task. The application code that accompanies this series includes support for gathering execution-time measurements. Invoke the main program with the command-line option -measure-cpu-time to cause the program to measure and report the CPU time consumed by each of the real-time tasks.

In contrast with more-traditional real-time programming platforms, one of the appeals of Java is that it is a more portable language. This portability is of particular value to organizations that are required to maintain and support a large evolving real-time software system across multiple (or evolving) hardware platforms. To generalize the portability benefits of Java SE to the domain of real-time Java software components, it is important to include as part of the real-time software component's encapsulation an ability to understand and manage its CPU time and memory resource consumption. The source code for our sample traction monitoring application includes an example of how this is accomplished.

As you review the accompanying source code, you will discover that execution-time measurements are performed under a special configuration of the real-time task dispatcher, which is identified as the time-warp configuration. This configuration runs the periodic tasks back to back, releasing each new task to run as soon as all tasks whose deadlines precede the new task's release time have completed execution, without waiting for the scheduled release time to arrive.

The primary motivation for measuring execution times in this time-warp time configuration is that Java SE running on typical mainstream operating systems measures task CPU times using a statistical sampling technique. An underlying periodic timer interrupts the running thread and remembers how often each running thread is interrupted. The threads that are interrupted more frequently are assumed to be running more of the time. This statistical sampling technique assumes that there is no correlation between the periodic timer interrupt and the likelihood that one task or another will be running. However, that assumption is not valid if tasks are running under a time-driven schedule, which is typically aligned with the same periodic timer that samples execution times. Thus, during execution time measurements, we disable the delays that typically occur between consecutive firings of the periodic task executions.

The Important Role of Task Priorities

The mathematical theory from which the upper bound on CPU utilization is derived is built upon a careful analysis of a real-time workload's critical scheduling instant, which is characterized as a moment in time when all tasks become eligible for execution at the exact same time.

Consider, for example, a real-time workload consisting of task A, with a 10 ms period and a 5 ms execution time (50 percent CPU utilization) and task B with a 1 ms period and a 200 μs execution time (20 percent utilization). The utilization bound for two tasks is 82.84 percent, so this workload is fully schedulable. Figure 2 illustrates the behavior of a critical scheduling instant for this workload.

realtime-pt2-f2

Figure 2. Rate-monotonic scheduling of two real-time tasks

Since task B has a shorter deadline than task A, it runs at a higher priority. During the 6.4 ms required to complete task A's execution, task B runs to completion seven times. Note that each release of task B and the single release of task A are all completed within their respective deadlines.

In Part 1 of this series, we saw that each WheelSensor monitor has a deadline of 400 μs, much shorter than its 8 ms period. To analyze whether this constraint will be satisfied, consider the critical scheduling instant involving all threads at priority 9 or higher.

Suppose all the WheelSensor threads (priority 9) and the Dispatcher thread (priority 10) are released at the same time. The last WheelSensor thread to execute is completed at time 38.4 + 4.62 + 4.42 + 6.53 + 5.69 = 56.66 μs, which is well within the specified constraint of 400 μs. Note that there is no need to consider interference by lower-priority threads in this analysis.

Clearly, if programmer-requested task priorities are not honored, task B would be at risk of missing its deadline. If task A or some even lower-priority non-real-time tasks were allowed to run when task B is ready to run, task B would almost certainly miss its 1-ms deadline.

Unbounded Priority Inversion

When threads running at different priorities share access to synchronized data, it is necessary for the analysis of real-time scheduling to consider blocking interactions between threads. This issue is manifest in the implementation of our sample traction monitoring application's sensor simulator. Note that each of the application's four WheelSensor threads (priority 9), its BrakeMonitor thread (priority 8), its SteeringMonitor thread (priority 7), and its AcceleratorMonitor thread (priority 5) all share access to an instance of a simulation.ScriptA object that provides simulated sensor data. Though each thread invokes different synchronized methods of the ScriptA object, all contend for access to the same lock. This can result in unbounded priority inversion in the following scenario:

  1. The AcceleratorMonitor thread invokes a synchronized method of the ScriptA object.
  2. Before the AcceleratorMonitor thread returns from the synchronized method, the SteeringMonitor thread becomes ready to run. Assume this thread was previously blocked, waiting for its next periodic release. Since the SteeringMonitor thread has a higher priority than the AcceleratorMonitor thread, the AcceleratorMonitor thread cannot run. Thus, it does not finish executing its synchronized method, and does not release the lock on the ScriptA object.
  3. Now, suppose one of the WheelSensor threads becomes ready to run. Since it is the highest-priority thread, it can quickly progress to the point of invoking a synchronized method on the shared ScriptA object. At this point, it blocks, because the AcceleratorMonitor thread holds the lock, and that thread cannot release the lock because it has been preempted by the SteeringMonitor thread.

This situation is known as unbounded priority inversion because the higher-priority thread is required to wait for "completion" of the medium-priority thread's execution. In the most general case, the medium-priority thread may run for an unbounded amount of time before it yields to the lower-priority thread.

Luckily, in our application, the medium-priority thread will eventually block on an attempt to invoke another synchronized method of the ScriptA object, allowing the AcceleratorMonitor thread to continue executing to the point of returning from the synchronized method, which allows the WheelSensor thread to acquire the lock and run, but only after waiting for all of the medium-priority thread's execution to be complete. This unbounded priority inversion could cause the WheelSensor thread to miss its deadline, even if the overall system utilization is below the rate-monotonic utilization limit. Below, we'll see a technique used to address this problem in real-time systems.

Considerations for Selecting a Platform for Hosting Real-Time Workloads

A real-time workload implemented in Java SE must run on top of a platform that comprises a Java Virtual Machine (JVM), which is layered on top of an operating system, which itself is layered on top of a particular hardware architecture. Each layer of this hierarchy can impact the ability of the Java SE application to comply with its real-time constraints. For example, architectures that rely on memory caches, hardware optimizations based on branch prediction and out-of-order execution, and virtual memory paging to achieve high performance might exhibit inconsistent execution times that are very difficult to predict for individual real-time tasks.

The operating system upon which the JVM depends for various services can also impede compliance with real-time constraints if, for example, it does not strictly honor programmer-specified priorities or if it does not provide options to lock virtual memory pages into physical memory.

Also, other tasks running as part of the operating system workload, including tasks that might be part of the JVM implementation, might contend for access to CPU time resources. If the behavior of the other tasks that comprise the real-time workload are not well understood and are not included in the RMA, the analysis is not meaningful because it is based on incomplete and incorrect information. Thus, Java SE threads might not behave within the constraints predicted by RMA and will be likely to miss their deadlines.

Traditional JVM implementations are not configured to support the specialized needs of real-time applications. Rather than assuring behaviors that allow software engineers to analyze and guarantee compliance with real-time constraints, typical virtual machine configurations are tuned instead to maximize overall throughput and user perception of responsiveness. When there exists an abundance of resources that is far greater than a particular real-time application workload's relatively small resource needs, and when the consequences of missing a deadline constraint are relatively harmless, it might be practical to deploy a real-time Java SE application on almost any JVM. But in other scenarios, it is more important to deploy on a virtual machine that is designed to support real-time operation. Below, some of the differences between virtual machine implementations are summarized.

Lazy Dynamic Class Loading

Lazy dynamic class loading is one of the techniques used by traditional JVMs to amortize the total cost of application loading across a longer time span. The idea is to defer the loading of dependent classes until the last possible moment. For example, if class A has a method foo which invokes method baz of class B on a conditional path through method foo and the application has no other dependencies on class B, the traditional virtual machine will not load class B until A.foo is invoked under conditions that cause A.foo to invoke B.baz.

Lazy dynamic class loading improves the user's perception of application responsiveness because the first incrementally loaded portions of the application can begin executing immediately, long before the entire application has been loaded. However, to the real-time software engineer, lazy dynamic class loading adds considerable complexity to the analysis of task execution times. It is difficult to predict exactly when classes will be loaded, and since the CPU time and memory resources consumed by class loading activities are difficult to predict, it would be desirable to ensure that class loading does not occur during the calibration measurements or the actual execution of a real-time application.

Most real-time virtual machines offer configuration options that allow real-time applications to avoid the unpredictable interference caused by lazy dynamic class loading. For example, some real-time virtual machines provide a development tool that statically links the entire application prior to runtime. The fully linked application can be deployed either in the form of fully resolved, portable Java bytecodes or as platform-targeted native translations of the bytecode.

For real-time Java programs that need dynamic class loading, such as programs that use the Class.forName service to look up a class by name, some real-time virtual machines support a configuration option that is known as eager loading. In this mode, for every class that is loaded, all the classes referenced from the loaded class's constant pool are also immediately loaded. Compared with a traditional JVM, this mode of operation results in slower application startup because no part of the dynamically loaded component can begin to execute until all of its dependencies have been loaded. However, this gives real-time programmers control over when the lazy dynamic class loading occurs and also over the priority of the thread that is responsible for performing the lazy dynamic class loading. In some real-time Java SE systems, high priority real-time threads continue to run without violating their real-time deadlines while new capabilities are dynamically loaded at lower priorities.

Just-in-Time (JIT) Compilation

Adaptive JIT compilation is a heuristic that allows a virtual machine to optimize the use of instruction memory and the implementation of individual methods depending on the most-recent behavior of the application. This heuristic enables high performance, but it complicates analysis of compliance with real-time constraints in two important regards:

  • Often, compilation or recompilation of a method is triggered by invocation of the method. The compilation process consumes considerable CPU time and memory resources, causing one in thousands of invocations of the method to take much longer than the others. The initial JIT compilation usually occurs shortly after application startup. If this occurs during CPU time-calibration measurements, it will skew the measurement results.
  • In fully adaptive systems, a method might be compiled multiple times, with each compilation producing different machine-code translations. In some environments, a translated method might be discarded to make instruction memory available for other purposes. In this case, subsequent invocations of the method run in interpreted mode. The overall impact of dynamic adaptive JIT compilation is that it is very difficult for the application programmer to predict the execution time of each invocation of the method. Sometimes the method runs slowly (interpreted), sometimes it runs faster (compiled but not optimized), and sometimes it runs very fast (compiled and optimized for the specific execution context and parameters).

Virtual machines that are designed to support predictable real-time behavior generally provide capabilities to avoid the difficulties described above. For example, ahead-of-time (AOT) compilation translates the entire Java program prior to execution. With this approach, the program is deployed as a native-compiled executable binary instead of as portable Java bytecodes. But the portable bytecodes still exist, and can be compiled for different execution platforms, as needed.

Another approach offered by real-time virtual machines is the optional use of flash compiling, also known as eager JIT compilation. The idea here is to immediately compile every method that is dynamically loaded, as soon as it is loaded, rather than deferring JIT compilation until after the code has executed a certain number of times in interpreted mode.

With AOT compilation and eager JIT compilation configured to support predictable real-time execution of compiled methods, some of the optimizations performed by Java HotSpot VM are not available. For example, dynamic inlining, which inlines a method at runtime based on information that is not available until the method is invoked, is not performed because this would result in inconsistent execution times for the invoking method. And deoptimization—a sophisticated Java HotSpot VM technique that recompiles methods whose original translations were based on assumptions that are no longer true (due, for example, to dynamic loading of new classes) or methods that become eligible for new, improved optimizations (due to new profiling data, new awareness of certain object types, or unloading of certain classes)—is generally not available in a real-time virtual machine that is configured to support consistent predictable method execution times.

In comparison with traditional JVMs, these real-time approaches result in slower application startup, because eager JIT compilation must process all of the loaded classes before any application code can run, and slower execution in general, because both AOT and eager JIT compilation must select between alternative optimizations without the benefit of runtime profiling and without knowledge of the actual values that are held within certain variables when the program is running. Furthermore, the real-time virtual machine's compiler is not allowed to exploit optimizations that might become invalid as a result of the program's future behavior. The benefit of these real-time approaches is that each execution of the method offers approximately the same performance.

Thread Priorities

As explained above, managing thread priorities is critical to engineering a system that is guaranteed to satisfy all of its real-time constraints. It might come as a surprise that traditional JVMs running on typical mainstream operating systems, such as Windows, Linux, and Oracle Solaris, do not promise to schedule threads in priority order. Often, a traditional JVM will run a lower-priority Java thread even though higher-priority Java threads are ready to run.

In many traditional virtual machine environments, the virtual machine and the underlying operating system work together to automatically age thread priorities rather than strictly honoring programmer-specified priorities. The result of automatic priority aging is to increase the priority of threads that frequently block (so-called I/O-bound threads) and to decrease the priority of threads that run for long spans without any blocking (CPU-bound threads). These automatic priority aging heuristics approximate the priority assignment dictated by RMA. However, the approximation is not exact, and the heuristic depends on historic data representing the recent behavior of the application. Initial scheduling decisions are made in the absence of any historical data, and even when historical data exists, the most recently collected historical data is not necessarily representative of near-future behavior.

Virtual machines designed for real-time operation generally honor the priorities specified by application developers. They also offer configuration options to specify the mapping between Java priorities and the priorities of the underlying operating system, because this is critical to analyzing the real-time schedulability of a real-time workload that comprises a combination of Java and non-Java threads. In some cases, real-time virtual machines also provide special APIs to allow real-time Java developers the option of assigning a broader range of thread priorities than the ten-priority range of Java SE.

Priority Inversion

Even when programmer-specified priorities are strictly honored, it might be difficult to analyze and control real-time behavior in the presence of unbounded priority inversion that occurs when threads running at different priorities share access to certain synchronized objects. Virtual machines designed for real-time operation generally support a protocol known as priority inheritance. With this protocol, whenever a high-priority thread requests a lock that is currently owned by a low-priority thread, the low-priority thread temporarily inherits the priority of the requesting thread. This allows the thread that owns the lock to run at the priority of the lock's highest-priority requesting locker, eliminating the priority inversion problem described above.

Traditional JVMs do not support the priority inheritance protocol. However, it should be emphasized that solving the priority inversion problem is less important with a traditional JVM that does not strictly honor programmer-specified thread priorities. This is because the normal priority-aging approaches generally reduce the effect of priority inversion: Priority aging will decrease the priority of an intermediate-priority thread that has preempted the thread that holds the shared lock since this thread will be perceived as CPU-bound.

Garbage Collection

In our sample traction monitoring application, we chose to avoid all reliance on allocation and garbage collection in order to simplify the analysis of the real-time workload. While avoidance of garbage collection represents a valid approach for building real-time Java applications, it turns out that the motivations to use Java in real-time applications often hinge on support for garbage collection. Without automatic garbage collection, key benefits of Java are missing. Among the missing benefits are ease of developing new software capabilities, ease of integrating independently developed software components, access to thousands of off-the-shelf Java software components, more-reliable operation through dynamic allocation of appropriately sized buffers without the risk of dangling pointers and reduced risk of memory leaks, and simple evolution of software capabilities by extending existing classes with new subclasses that offer enhanced capabilities.

Using garbage collection in real-time systems is possible only if the virtual machine implements real-time garbage collection, which behaves somewhat differently than traditional JVM garbage collectors. In traditional virtual machines, garbage collection activities run at unpredictable times and for unpredictable durations, resulting in the reclamation of unpredictable amounts of memory. When garbage collection runs, it often runs at the highest Java priority, preventing all Java application threads from running.

Given these behaviors, the real-time programmer's only option is to forbid its use, even by Java threads that have no specific real-time constraints. Fortunately, virtual machines designed to support real-time execution provide specific assurances regarding the behavior of garbage collection.

The following are some of the specific garbage collection capabilities that are supported by real-time virtual machines:

  • Garbage collection in a traditional JVM might not reclaim all the dead memory within the memory heap. In part, this is because the traditional garbage collector usually relies on generational techniques that focus on reclaiming the easy garbage that can be identified with the least amount of effort. Only when the simplified generational technique fails to reclaim a sufficient amount of memory does the traditional garbage collector resort to a full garbage collection of all memory. But even when the traditional garbage collector turns its attention to the full heap, it might not find every byte of garbage, because it is common for the implementation to conservatively retain those objects that are difficult to prove to be garbage.

    Because of the critical nature of the real-time systems that are implemented with Java, it is essential that the garbage collector reclaim all garbage rather than just a conservative approximation of the garbage. Note that a single garbage object accidentally treated as live memory might indirectly hold references to an arbitrarily large amount of additional garbage memory, all of which would also be treated erroneously as live memory. And since real-time virtual machines are often configured to run entirely in physical memory, without virtual memory paging in order to ensure time-predictable behavior, the reliability of the Java application might suffer if unwanted garbage is accidentally retained.

    Real-time virtual machines can generally be configured to reclaim all garbage each time they run. Due to careful attention to these garbage collection details, representative mission-critical Java applications have demonstrated continuous up times of over a year without rebooting.

  • When garbage collection is performed on a real-time Java application, the garbage collection activities are modeled as additional real-time threads. It is important in such systems that the software integrator be able to specify the priority at which the garbage collection threads run. Furthermore, it is critical that the garbage collection threads be both preemptible and incremental. Preemptible means the garbage collection threads can be preempted by higher-priority Java threads when necessary.

    Being able to preempt the garbage collector is relevant because real-time virtual machines generally begin garbage collection activities well ahead of the out-of-memory condition, so high-priority preempting threads are still able to allocate memory. Even when memory is entirely exhausted, it is critical that high-priority threads that do not allocate new objects be able to preempt the running garbage collector. This allows real-time engineers to architect their systems with two criticality levels. The threads with the most-urgent deadlines (for example, deadlines of less than 100 μs), can be written so that they avoid all memory allocation. Threads with less-urgent deadlines coordinate their shared use of the real-time garbage collection resources, as discussed further below. Note that only these threads will suffer scheduling delays if garbage collection falls behind because of transient work overloads or an improper pacing configuration.

    Saying that the garbage collector is incremental means that when the garbage collector is resumed following a preemption, it resumes with the next increment of work rather than restarting its efforts. The incremental property is important because it ensures that garbage collection makes predictable forward progress toward replenishing the memory allocation pool. This allows analysis of how much CPU time is needed for garbage collection, which is critical to understanding the pace at which the memory allocation pool will be replenished for a given CPU-time budget granted to garbage collection.

  • Another consideration relevant to the reliability of mission-critical real-time Java applications is management of memory fragmentation. Over time, due to high allocation rates and the diversity of object sizes that are typical in Java applications, it is possible for the memory heap to become fragmented. In a fragmented heap, a large amount of free space might exist, but the free space might be divided into thousands of small allocatable segments.

    When the heap becomes fragmented, it is not possible to allocate large contiguous objects. Even the allocation of smaller objects from a fragmented heap is less efficient, because the allocator must search free lists rather than simply adjusting a thread-local pointer to the next available memory within a large allocatable segment.

    Note that the use of thread-local allocation buffers (known as TLABs to the Java HotSpot VM community) might introduce variability in object allocation times. If an allocation can be satisfied from the TLAB, the allocation happens very quickly. However, if the allocatable memory within a TLAB is exhausted at the time of an allocation request, extra time and effort is required to synchronize with the global allocator in order to carve off another large block of memory to become the allocating thread's new TLAB. When the costs of allocation are averaged over hundreds or thousands of operations, the execution times of methods that perform memory allocation are predictable within statistical bounds. The ability to make effective use of TLAB optimizations is an example of the benefit of using soft rather than hard real-time approaches.

    To support long-term reliable operation, most real-time virtual machines offer support for automatic defragmentation of the heap as part of the garbage collection process. As the garbage collector identifies live objects, it incrementally relocates them to contiguous memory locations, thereby coalescing multiple smaller free segments into contiguous larger allocatable memory segments.

Being able to analyze and control the pace at which real-time garbage collection replenishes the memory allocation pool is the ultimate objective of the various garbage collection behaviors described above. In a real-time system, it is critical to begin garbage collection activities well before the allocation pool has become exhausted. Ultimately, the goal of pacing is to avoid all out-of-memory conditions. When garbage collection is properly paced, the system can ensure that every memory allocation request is satisfied immediately, without having to suspend the real-time thread while it waits for garbage collection to catch up.

Figure 3 illustrates a trace of a running real-time JVM with a simulated air traffic control workload. In this figure, the yellow sawtooth shape represents the amount of memory that is available for allocation at each instant in time, as measured by the scale on the left side of the chart. The blue represents the CPU time consumed by the application's real-time Java threads, and the red represents the CPU time consumed by garbage collection threads, with the scales for each provided on the right side of the chart. At each point in time, the sum of garbage collection and application processing is no greater than 100 percent of full utilization.

realtime-pt2-f3

Figure 3. Interaction between application threads and garbage collection threads

Under the control of a garbage collection pacing agent, garbage collection is divided between threads running at two distinct priorities. At a priority that is potentially above the priority of some of the real-time application threads, the pacing agent schedules aggressive garbage collection behavior up to a bounded utilization limit that is specified by the system integrator. At a priority that is below all of the real-time application threads, the pacing agent allows garbage collection activities to consume all available CPU time during times when garbage collection is enabled. But the pacing agent enables garbage collection only when it determines that the application is sufficiently close to exhausting the allocation pool.

Among the threads that participate in the garbage collection effort, some are dedicated to identifying, coalescing, and zeroing segments of garbage, while others manage finalization of objects destined to become garbage following finalization. As with other garbage collection threads, finalization threads run at a priority specified by the system integrator, and the system integrator also specifies an upper bound on the percentage of CPU time that can be consumed by finalization threads. Note that these configuration options are available only on certain virtual machines designed to enable Java software to achieve reliable compliance with real-time constraints.

In this particular configuration, all garbage collection is configured to run at a lower priority than the application threads. Note that the application thread utilization tends to increase slightly during and immediately following each burst of garbage collection activity. This is because the application must do a bit of extra work to coordinate with the incremental garbage collection threads and because memory cache performance benefits decrease when garbage collection activities march through all of memory in search of garbage.

If it chose to, the garbage collection pacing agent could begin each new pass of the garbage collector as soon as the preceding pass is completed. However, this is undesirable because it dedicates far more CPU time than necessary to garbage collection, and it would force the application code to always run at the lower efficiency that occurs while coordinating with garbage collection.

Instead, the pacing agent monitors the trends in memory availability to predict when the allocation pool will become exhausted. With its knowledge of how much total CPU time is required to perform garbage collection, and its parameters governing the maximum CPU utilization that is available for garbage collection, the pacing agent delays the start of each garbage collection pass until the last possible moment. The system integrator also provides to the pacing agent threshold parameters that control the amount of safety reserves to be retained for situations in which the workload is not as predictable as in this air traffic control application.

In case the pacing agent fails to properly schedule garbage collection due to inappropriate configuration or transient system work overloads, each real-time thread that endeavors to allocate memory from the empty allocation pool dedicates its available real-time CPU allotment to garbage collection activities using a priority inheritance mechanism. Out-of-memory events are logged so that system administrators and software engineers can adjust the system configuration.

Pacing of garbage collection is based on a well-established theory of operation [2]. Assume a real-time application has reached a steady state, within which it allocates new memory at roughly the same rate at which previously allocated memory becomes garbage. Let U represent the application's upper bound on live memory, which is represented in bytes, V represent the application's maximum allocation rate, which is measured in bytes allocated per second, S represent the total CPU time required to perform a complete garbage collection pass, R represent the fraction of CPU time dedicated to garbage collection, and M represent the total size of the heap, which is measured in bytes.

Generally, an object that becomes garbage after the start of the current garbage collection pass might not be recognized as garbage by the current garbage collection pass. At the end of garbage collection, the heap contains a maximum of U bytes of live memory plus V (S / R) bytes of dead memory. If we immediately start another pass of garbage collection, an additional V (S / R) bytes of memory will be allocated while the previously dead memory is being collected. Thus, the heap size M must be greater than or equal to U + 2 V (S / R) bytes. Rearranging terms, we see this relation as shown in Figure 4:

realtime-pt2-f4

Figure 4. Formula for pacing garbage collection

As would be expected, the percentage of CPU time that must be dedicated to garbage collection is proportional to the total CPU time required to perform garbage collection. It is inversely proportional to the difference between total heap size and maximum live memory consumed by the application, the memory slack. For typical configurations, the CPU time dedicated to garbage collection is less than 5 percent.

Conclusion

The Java SE programming language offers many of the same benefits to developers of mission-critical real-time systems that it offers to traditional data processing applications: improved portability, scalability, modular composition of independently developed software components, and software reuse. These benefits are manifested as lower software development and maintenance costs, improved features and generality, and shorter schedules for new development efforts.

As with more traditional languages such as C and C++, Java SE developers need to use special techniques to address the needs of real-time applications. Some of the special techniques involve coding practices to reduce or eliminate the need for garbage collection and to encapsulate the resource calibration code with each real-time software component. Other techniques involve careful selection and configuration of the virtual machine on which the real-time Java components will be deployed.

References

  1. C. L. Liu, J. W. Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment. Journal of the Association of Computing Machinery, Vol. 20, No. 1, January, 1973, pp. 46–61.
  2. K. Nilsen. Adding Real-Time Capabilities to Java. Communications of the Association of Computing Machinery, Vol. 41, No. 6, June 1998, pp. 49–56.

See Also

About the Author

As Chief Technology Officer over Java at Atego Systems—a mission- and safety-critical solutions provider—Dr. Kelvin Nilsen oversees the design and implementation of the Perc Ultra virtual machine and other Atego embedded and real-time oriented products. Prior to joining Atego, Dr. Nilsen served on the faculty of Iowa State University where he performed seminal research on real-time Java that led to the Perc family of virtual machine products. Dr. Nilsen participates in the Java Community Process as a member of the JSR 282 and JSR 302 Expert Groups. He holds a BS in physics and MS and PhD degrees in computer science. Atego Systems is a wholly owned subsidiary of PTC, Inc.

Follow us:
Blog | Facebook | Twitter | YouTube