Graphics Performance Improvements

High Performance Graphics

Graphics Performance Improvements in the Java 2 SDK, version 1.4

Performance is always an issue for graphics programs. Rendering complex shapes and patterns on a display requires iterating through the pixels and calculating values for each of them. Loading an image and performing a filter operation on it sometimes requires allocating a large block of memory. When using these complex shapes, patterns, and images in an animated application, fast and reliable rendering can be difficult to acheive.

Sun Microsystems first released the 2D graphics framework, Java 2D with the Java 2 SDK, version 1.2. The graphics framework that Java 2D provides is much more powerful than the limited set of features offered by the JDK 1.1. For example, Java 2D includes support for creating arbitrary shapes, text, and images and provides a uniform mechanism for performing transformations, such as rotation and scaling, on these objects. However, these new features required more code to implement them and extra data to be stored for their operations, both of which caused additional overhead between the calls to the API and the rendering of pixels, all culminating in a performance degradation compared to the JDK 1.1. Another factor contributing to the performance degradation is that the hardware acceleration available in the JDK 1.1 was no longer accessible when using the SDK, version 1.2 because offscreen images were moved into software memory for the purpose of cross-platform consistency.

Now that Java 2D provides such a sophisticated graphics framework for the SDK, the Java 2D team has focussed on improving the performance of the features offered by this framework. Many changes made in the pipeline architecture have improved the performance of a wide range of Java 2D operations that have been habitually slow since the introduction of Java 2D. In addition to these general improvements, version 1.4 introduces the VolatileImage API, which allows you to take advantage of the hardware acceleration now accessible through the Java 2D API. The VolatileImage API allows you to create hardware-accelerated offscreen images, resulting in better performance of Swing and gaming applications in particular and faster offscreen rendering in general.

New Pipeline Architecture

For version 1.4 of the SDK, the Java 2D graphics pipeline architecture has been almost completely rewritten to solve many general performance problems. The key enhancements made in the new architecture are:

less pipeline validation overhead
faster copying
better code sharing and code reduction

Before reading about the details of these three general enhancements, it's essential to understand a little of the background on the rendering process used in Java 2D, which the next section explains.

Background on Java 2D Graphics Pipelines

In the Java 2D API, a Graphics2D object encapsulates information about the current rendering state, such as the transform to use for converting from user space to device space and the color to use when rendering to the device. All of this state information is represented as a set of rendering attributes on the Graphics2D object. For example, when you set the color attribute of the Graphics2D, all of the text and shapes you render with this Graphics2D will be drawn in that color.

The Java 2D implementation uses different modules of code, called pipelines, when determining what to render to the device. Which pipeline Java 2D uses depends on what it is rendering and how it is rendering it. The pipeline that Java 2D uses to fill a shape with a complex paint is different from the one it uses to fill the same shape with a simple Color. When filling a shape with a complex paint, Java 2D must query the Paint object every time it needs to assign a color to a pixel whereas a simple color fill only requires iterating through the pixels and assigning the same color to all of them.

The Java 2D implementation has many pipelines for handling different graphics primitives and rendering attributes. The process of inspecting the rendering attributes, determining which pipeline to use, and deciding how to store the data necessary for rendering is called pipeline validation.

Less Pipeline Validation Overhead

For version 1.4, the new pipeline architecture reduces the performance overhead produced by pipeline validation with several implementation changes that:

Improve the way that data is shared by the rendering pipelines.
Reduce the amount of code executed when responding to changes in the rendering attributes.
Use better heuristics to determine when a pipeline needs to be revalidated.

In the Java 2 SDK, version 1.2 and 1.3, common operations on a Graphics object often invalidated the rendering data cached for this Graphics object. This invalidation interrupted the rendering process by causing continuous re-creation of rendering information for the Graphics object. The reason for this invalidation is that the rendering primitive objects often contained both the rendering logic and the attribute data that the rendering instructions needed to modify the destination. For example, whenever setColor was called, the pipeline was invalidated and the rendering primitive object had to be re-created with the new color information.

Another problem with the old architecture is that the code for the inner loops that modified the pixels was often bundled up with the code to access the memory for the pixel data. Since pixels could be stored in Java arrays or in the VRAM of a framebuffer and both of these memory types needed different code to set up access to the pixels, the implementation would have to create multiple primitives to draw a line in a particular pixel format--one for each type of memory storage in which that pixel format could occur. In addition, the code to set up the memory access was quite large and was duplicated in every primitive designed to modify pixels stored in that type of memory.

The new pipeline architecture eliminated these problems by improving the way data is cached and code is shared. The attribute data associated with a Graphics2D object is now cached so that it can be shared between objects. In particular, color and clip information has been removed from rendering primitives and is stored in the Graphics2D object itself where all rendering primitives can access it. Now that rendering primitives can share data, those primitives that used to be instantiated with the data have been deleted.

In the new architecture, a common interface is used by the primitives to access the memory where the pixels are stored so that a single primitive can modify pixels of a given format regardless of what type of memory those pixels are stored in. The code to access the various types of memory is collected into a group of shared modules that these primitives invoke through a common interface. Now all primitives that access a certain type of memory can share the code that sets up access to that memory, and all objects that write to destinations with the same kind of pixel data can share code that modifies that kind of pixel data. This new ability to share code among the rendering primitives greatly reduces the code in the implementation and in turn reduces the runtime footprint.

In some cases, revalidation is necessary, such as when an opaque color is changed to a transparent color. However, the algorithm to render one opaque color is the same as rendering any other opaque color on a particular destination. So replacing one opaque color with another should not invalidate a pipeline. The implementation now only invalidates the pipeline when an attribute is changed to a different type of value, rather than when an attribute is changed to a different value of the same type.

Benefits of New Pipeline

The changes in the new pipeline architecture have improved all areas of Java 2D and other APIs that use Java 2D, not only in terms of performance but also in terms of rendering quality and reliability.

More Reliable Rendering

The more intelligent data caching and code reuse available in the new pipeline architecture have resulted in more reliable rendering in many areas, including:

Rendering text that has attributes
Drawing lines with large scale factors
Performing copyAreaoperations

Rendering Text That Has Attributes

The Java 2D font engine produces either a bitmap or an outline path from a glyph. The font engine can cache a bitmap in the bitmap glyph cache but cannot cache an outline path. If the text is small, copying glyphs from the cache to the destination is a lot faster than rendering the glyphs using an outline path, and the quality of the rendering is also better. When the glyphs are large, the quality and the rendering performance gains are less noticeable when using a cached bitmap. In the old architecture, the implementation rendered solid text from the glyph cache bitmap unless the text had rendering attributes, in which case it used the outline path pipeline. Not only was this process slow, but it also produced inconsistent results. For example, if the opaque color was changed to a gradient, the implementation had to switch from the bitmap pipeline to the outline path pipeline to render the text. If this pipeline switch occurred while a program was running, the text appeared to jiggle because of the sudden change in text quality.

In the new architecture, a more predictable heuristic is used to determine when the bitmap pipeline or the outline path pipeline should be used. If the text is smaller than a certain size, the implementation renders it from the glyph cache bitmap regardless of what rendering attributes the text is rendered with. If the text is as large or larger than the threshold size, the implementation uses the outline path pipeline, which produces about the same performance as the bitmap pipeline because larger text would require more memory allocation in the cache without much of an improvement in quality.

Drawing Lines With Large Scale Factors

Drawing lines with large scale factors produced integer overflows when the internal code to render the lines tried to represent the lines as integers. If these numbers fell outside the 16-bit range, the code would go into an infinite loop trying to clip the lines. This problem has been resolved, and now Java 2D can handle arbitrary scale factors.

Performing `copyArea` Operations

Many copying routines have been improved by eliminating the use of an intermediate buffer. In particular, the copyArea operation, which involved an overlapping source and destination, such as used in Swing scrolling components, can often be performed in place rather than relying on an intermediate buffer. Avoiding an intermediate buffer means less memory allocation and garbage collection is needed.

The new image-scaling operations also avoid extra copying by bypassing an intermediate buffer. Now DirectDraw scale operations on Win32 and software scale operations on Solaris and Linux are able to blit the scaled image directly to the destination. Note that hardware-accelerated scaling is currently disabled on Win32 because of quality problems, but you can enable it with a runtime flag .

Image Scaling Performance Improvements

Rendering a scaled image using drawImage with a scale other than 1:1 was often much slower after JDK 1.1. The drawImage implementation created a scaled image in an intermediate buffer and then copied the buffer to the destination. The memory allocation required for the intermediate buffer and the extra copies to and from the buffer were two factors contributing to the performance degradation. Another reason for the performance loss was that any accelerated image-scaling operations available on the display hardware could not access the memory that stored the image data.

The new architecture has a software-scaling operation that scales an image directly into the destination in a single copy operation. This new scaling operation benefits all platforms. If you are rendering a scaled image in the Win32 environment, you can take advantage of the access to DirectDraw surfaces that the new architecture provides if both the source and destination of the scaling operation are in DirectDraw surfaces. The DirectDraw scaling operation blits the scaled image directly to VRAM. Note that the hardware-accelerated image scaling on Win32 is currently disabled because of quality and consistency problems, but you can enable it with the runtime flag, sun.java2d.ddscale=true. See the section Flags for Controlling Performance and Quality for more information. If hardware-accelerated scaling is not available to you, you still benefit from the new software-scaling operation. The new software-scaling operations and the access to hardware-accelerated scaling obviate the need for extra memory allocation and additional copy operations.

The performance of scaling to and from an accelerated image improved significantly between the Beta 2 and Beta 3 releases because of a new heuristic that, based on the frequency of read operations performed on an image, can store the image to the type of memory that will allow the best performance. See the next section for more information on the new heuristic.

Accelerated-Image Reading Performance Improvements

Translucent and scaling operations to and from an accelerated image were very slow in the Beta 2 release of the SDK 1.4. This performance degradation was a result of frequently reading the image from VRAM, which is a slow process compared with reading an image from system memory. Translucent operations require frequent reads because alpha compositing performs read-modify-write operations on the destination. Scaling from an accelerated image requires reading from the source or destination image in order to perform the operation successfully. The Java 2D implementation available in the Beta 3 release of the SDK 1.4 can transfer the surface to system memory if the image is experiencing frequent reads. If reads occur less frequently, Java 2D can transfer the surface back to VRAM. If you are working in a UNIX environment, you can override this heuristic by using the J2D_PIXMAPS environment flag. See Environment Flag for Solaris and Linux for more information.

Remote X Server Performance Improvements

When working with graphics in a UNIX or Linux environment, you have the option of performing your graphics computations on a remote client by running an X Server from your machine. However, if you used offscreen images with Java 2D while running the X Server you would experience poor performance because the offscreen image was created on the client side. Every time you needed to re-render the image to the destination, the image had to be copied from the remote X client to the display. If you were using the offscreen image for double buffering, you would have to endure a network copy of the image whenever the screen was repainted.

Because accelerated offscreen images are available in 1.4, the offscreen image is now created on the server side, local to the screen. Java 2D uses X protocol requests, which tell the X server what and how to render to the offscreen image located on the server side of the network. Only X protocol requests are sent over the network; the image itself stays on the server side. This particular improvement also has positive implications for Swing performance because Swing uses double-buffering. The Swing application does not have to wait for the backbuffer to be copied over the network for every screen refresh.

One drawback with both remote X and DirectDraw is that neither antialiasing nor alpha-blending can be accelerated. In fact, antialiasing and alpha blending operations on remote X are usually much slower compared to version 1.3 because the image must be copied to the X client to perform one of these operations and then the new image must be copied back to the server. The Java 2D team is looking into solutions to this problem for future releases. Since most Swing applications do not use alpha blending or antialiasing anyway, this problem should not cause serious concern for developers using Swing.

Local X Server Performance Improvements

In J2SE 1.4 releases prior to Beta 3, drawImage performance when rendering to the screen in some cases was worse than drawImage performance in J2SE 1.3. The reason for the performance degradation was that Java 2D was not using the Shared Memory Extension on Solaris and Linux in the local display environment. The Shared Memory Extension allows an X server and X client running on the same machine to jointly access shared memory, which enables faster data transfers.

For Beta 3, Shared Memory Extension is used on Solaris and Linux in the local display environment, resulting in better performance when rendering to the screen and to the images. When DGA is not available, toolkit images and images created with Component.createImage or GraphicsConfiguration.createCompatibleImage are stored in pixmaps instead of system memory, enabling faster copies to the screen using X protocol requests. You can override this behavior with the pmoffscreen runtime flag, described in the section Runtime Flag For Solaris and Linux .

Pixmaps that are not in shared memory might be stored in VRAM, which allows for fast copies to the screen, but reading from these images is very slow, as described in the section Accelerated-Image Reading Performance Improvements . Since Shared Memory Extension is now accessible in the local display environment, images that experience frequent reads can be stored in Shared Memory Pixmaps, which always reside in system memory. Java 2D can detect if an image is read from more frequently or copied from more frequently and can transfer the image to the appropriate memory. You can control which memory is used by setting the J2D_PIXMAPS environment variable described in section Environment Flag for Solaris and Linux .

Swing API Benefits

When using Swing in the SDK 1.4, you should notice better performance in many cases because Swing uses many of the Java 2D operations that were targeted for performance improvements for version 1.4. Because the rendering of Swing hierarchies relies heavily on common operations like setColor and Graphics.create, the invalidation and re-creation of rendering data caused poor repaint performance for many Swing applications. For this reason, Swing is a primary beneficiary of the reduced overhead resulting from the graphics pipeline improvements. Now that image data, such as color, is stored in a central, shared location, the many calls to setColor that Swing performs do not invalidate the pipeline, and thus do not result in worse performance for Swing applications. The process of creating and disposing of Graphics objects is also faster as a result of the general pipeline improvements. Since each subcomponent of a JComponent draws itself with a new Graphics object, a particular Swing application could create and dispose of many Graphics objects. The create and dispose operations now are much faster and produce less memory overhead.

Some Swing components allow you to scroll over a piece of data, such as an image or text. Often the scrolling destination overlaps the source, and in this case a copy is performed. Prior to version 1.4, the information would be copied to an intermediate buffer and then copied to the screen. These copies now occur in place, thereby avoiding the extra memory allocation and garbage collection associated with an intermediate buffer.

The new access to hardware acceleration for offscreen images has also dramatically improved Swing performance. Since Swing uses double buffering by default, Swing applications will generally perform better because of off-screen image acceleration. Swing performance on remote X is also improved because the backbuffer that Swing uses for double buffering is stored on the server side, thereby avoiding a network copy of the backbuffer every time the screen is repainted. For more information on the remote X improvements see the Remote X Server Performance Improvements section. See the next section for more information on hardware acceleration for off-screen images.

Hardware Acceleration for Offscreen Images

Java 2D provides access to hardware acceleration for offscreen images, resulting in better performance of rendering to and copying from these images. On win32 hardware acceleration is achieved through DirectDraw, an API designed for giving applications direct access to video hardware memory and acceleration. On Linux and Solaris, offscreen images might be stored in a pixmap, which is an area of memory that is stored on the X server side of the network.

In the case of DirectDraw, any image stored in accelerated memory could be "lost" at any time: certain events can cause the Windows operating system to clear the image out of VRAM, thus destroying the contents of the image.

A VolatileImage represents an offscreen image whose content can be lost at any time. However, a VolatileImage offers performance benefits over other kinds of images because a VolatileImage allows an application to take advantage of hardware acceleration, thus enabling the application to run at the native speed of the underlying platform.

The new hardware acceleration has improved the performance of Swing applications because Swing double-buffering now uses VolatileImage. For more information on hardware acceleration, see the VolatileImage API User's Guide .

Tips for Achieving Better Performance

Although the new performance improvements eliminate a lot of problems, they still don't optimize everything. In particular, alpha blending and anti-aliasing will adversely affect performance because calculating alpha values for each pixel takes more work than rendering opaque values. In addition, Java 2D does not currently hardware accelerate translucent images by default on Microsoft Windows, Solaris, and Linux systems. However, such images may be accelerated on Mac OS systems. Images can be inquired as to their acceleration state using the VolatileImage API in 1.4, and the Image API in 1.5. Opaque images or images with 1-bit transparency can be hardware accelerated. Acceleration support for 1-bit transparency enables you to accelerate sprites with transparent pixels. You can use 1-bit transparency to make the background color of a sprite rectangle transparent so that the character rendered in the sprite appears to move through the landscape of your game, rather than within the box represented by the sprite rectangle's background color.

When creating an image whose data is copied to the screen, you should create the image with the same depth and type of the screen. This way, no pixel format conversion between the image and the screen is necessary before copying the image. To create such an image, use either Component.createImage or GraphicsConfiguration.createCompatibleImage, or use a BufferedImage created with the ColorModel of the screen. The pre-defined types of BufferedImage have their own color models.

Rectangular fills--including horizontal and vertical lines--tend to perform better than arbitrary or non-rectangular shapes whether they are rendered in software or with hardware acceleration. If your application must repeatedly render non-rectangular shapes, consider drawing the shapes into 1-bit transparency images and copying the images as needed.

If you are still experiencing low frame rates, try commenting out pieces of your code to find the particular operations that are causing problems, and use the tips in this section to replace these problem operations with something that might perform better.

Another way to identify performance problems in your application is to run it with the new trace flag introduced in Beta 3 of the SDK, version 1.4. See the section Performance-Tuning Flag For All Platforms for more information on the trace flag.

If you still are experiencing performance problems, submit a bug with your test case.

Flags for Controlling Performance and Quality

In any graphics program, performance and quality can sometimes seem mutually exclusive. Most of the changes that the Java 2D team has made in the pipeline architecture have greatly improved performance without compromising quality, but in some situations you might want to decide whether you prefer high quality over high performance. You might also be working in an environment that does not allow your application to derive a performance benefit from a feature designed to improve performance in other environments. This list of flags allows you to disable or enable certain features so that you can obtain the best possible performance and rendering quality available in your environment.

Environment Flags

These environment flags are for use on a UNIX environment. You set these flags by using the setenv command. For example, to turn off DGA support and hardware acceleration, enter the following at the command line:

setenv NO_J2D_DGA

Environment Flag for Solaris and Linux

With the release of Beta 3 of the SDK, version 1.4, Java 2D uses Shared Memory Pixmaps in the local display environment for storing images that experience frequent reads. You can override this default behavior by using the J2D_PIXMAPS environment flag:

J2D_PIXMAPS=shared/server

If you set this flag to "shared", all images will be stored in Shared Memory Pixmaps if you are working in a local display environment. Conversely, if you set this flag to "server", all images will be stored in normal pixmaps, not Shared Memory Pixmaps. The normal pixmaps can be stored in VRAM at the discretion of the device driver.

Environment Flags for Solaris Sparc

NO_J2D_DGA Set this environment flag to turn off DGA support and hardware acceleration, which sometimes helps to reduce rendering artifacts on Solaris Sparc.
USE_DGA_PIXMAPS Set this environment flag to turn on DGA acceleration for pixmaps. The use of DGA with pixmaps is disabled by default because it was causing system freezes on some video boards, such as PGX32.

Runtime Flags

You set these flags when you are running your application. For example, to turn off DirectDraw, run your application like this:

java -Dsun.java2d.noddraw=true <yourapp>

Performance-Tuning Flag For All Platforms

If your application is experiencing less-than-desirable performance, the new trace runtime flag can help you determine the source of the problem. The flag is specified with a list of options:

-Dsun.java2d.trace=<optionname>,<optionname>,...

The options are:

`log`	print out the name of each primitive as it is executed
`timestamp`	precede each log entry with the `currentTimeMillis()`
`count`	at exit, print out a count of each primitive used
`out:<filename>`	send output (logging and counts) to the indicated file
`help`	print out a short usage statement
`verbose`	print out a summary of the options chosen for this run

If you use the log option, the Java runtime will print the executed primitives' names, most of which will be in this format:

<classname>.<methodname>(<src>,<composite>,<dst>)

The methodname represents the basic graphics operation that is used to do the actual rendering work of a Graphics method invocation. These method names will not necessarily map directly to methods on the Graphics object, nor will the number of calls made on the Graphics object map directly to the number of primitive operations performed.

The src and dst represent the type of surfaces or source data involved in the operation.

The composite names match the names in the AlphaComposite class fairly closely with the suffix "NoEa" meaning that the AlphaComposite instance had an "extra alpha" attribute of 1.0. The "SrcNoEa" type is the most commonly used composite type and represents the simplest way of transferring pixels with no blending required. "SrcNoEa" is often used behind the scenes even though the default composite is "SrcOver" when opaque colors and images are rendered because the two operations are indistinguishable for opaque source pixels.

Platform rendering pipelines are sometimes used for doing opaque operations on surfaces accessible by a platform renderer, such as X11, GDI, or DirectDraw. Their names currently use a simplified naming format, which has a prefix for the platform renderer and the name of the operation but without any classname or operand type list. Examples are "X11DrawLine", "GDIFillOval", and "DXFillRect". In the future these names should more closely approximate the names of the other primitives.

Runtime Flag For Solaris and Linux

Starting with the Beta 3 release of the SDK, version 1.4, Java 2D stores images in pixmaps by default when DGA is not available, whether you are working in a local or remote display environment. You can override this behavior with the pmoffscreen flag:

-Dsun.java2d.pmoffscreen=true/false

If you set this flag to true, offscreen pixmap support is enabled even if DGA is available. If you set this flag to false, offscreen pixmap support is disabled. Disabling offscreen pixmap support can solve some rendering problems.

Runtime Flags For Win32

-Dsun.java2d.noddraw=true
Setting this flag to true turns off DirectDraw usage, which sometimes helps to get rid of a lot of rendering problems on Win32.
-Dsun.java2d.ddoffscreen=false
Setting this flag to false turns off DirectDraw offscreen surfaces acceleration by forcing all createVolatileImage calls to become createImage calls, and disables hidden acceleration performed on surfaces created with createImage .
-Dsun.java2d.ddscale=true Setting this flag to true enables hardware-accelerated scaling. Hardware scaling is disabled by default to avoid rendering artifacts in existing applications. These rendering artifacts are caused by possible inconsistencies between the scale method that the software scaling operation uses (nearest neighbor) and the different scale methods that video cards use. Certain events that occur while an application is running might cause a scaled image to be rendered partially with hardware scaling operations and partially with software scaling operations, resulting in an inconsistent appearance. For now, you can enable acceleration by setting the ddscale flag to true .

Examples Demonstrating Performance Gains

Since the graphics performance gains you experience will depend partially on the hardware and operating system environment that you have, we have provided, in place of charts and data, this set of examples that you can run to see the performance improvements for yourself. Try running them with both version 1.3 and 1.4 of the Java 2 SDK.

CopyAreaPerf

This example scrolls the screen a certain number pixels at a time, demonstrating how the reduction in memory allocation and garbage collection can really speed up your copying operations. When you run the example, specify how many pixels you want the application to scroll at a time.

FadeTextSplash

The FadeTextSplash example shows fading text using alpha composites. If you run it with version 1.4, you will see the bitmap glyph cache in action. Notice that the text doesn't jiggle and you will not see any metrics changes when you run the application with version 1.4. When you run the example, you can specify what text you want the application to display.

PhotoMesa

This application is a digital image library browser and demonstrates the image-scaling improvements in Java 2D by allowing you to zoom in on an image in the library. The application was developed by Dr. Ben Bederson, assistant professor of computer science, and director of the Human-Computer Interaction Lab at the University of Maryland. When you run this example, make sure you are using the Beta 2 or later release of the Java 2 SDK, version 1.4 .

See the VolatileImage API User's Guide for examples demonstrating the VolatileImage API.