Developer J2SE
Big Changes Coming for Java
By Jason Hunter
JDK 1.5 release contains major improvements in Java syntax.
The Java language syntax hasn't changed much since the 1.0 release first became popular with developers. Although the 1.1 release added inner classes and anonymous inner classes, and the 1.4 release added assertions with the new assert keyword, the Java syntax and keywords remained the sameas static as a compile-time constant. That's about to change with J2SE 1.5 (code-named Tiger).
While past J2SE releases have concentrated mostly on new classes and performance, Tiger aims at enhancing the
language itself with the goal
of making programming in
Java more expressive, developer friendly, and safer, while minimizing incompatibility with preexisting
programs. The language changes include generics, autoboxing, an enhanced "for" loop, typesafe enums, a static import facility, and varargs.
Improving Type Checking with Generics
Generics allow you to specify the actual type of objects used in a collection rather than just using Object, as you've done in the past. Generics also go by the name "parameterized types" because in generics, a class type accepts type variables that affect its behavior.
Generics are not a new idea. C++ has templates, but they are complicated and cause code bloat. C++ coders can use a little trickery to implement the factorial function using only C++ templates and then watch in horror (or awe) as the compiler generates C++ source code to handle the template calls. Java developers have learned from C++ and experimented with generics long enough to learn how to do them right. The current plan for Tiger evolved from the robust Generic Java (GJ) project. The GJ project tag line is "Making Java easier to type, and easier to type."
To understand generics, let's start with an example that doesn't use generics. This code prints a collection of strings in lower case:
// Takes a collection of Strings
public void lowerCase(Collection c) {
Iterator itr = c.iterator();
while (itr.hasNext()) {
String s = (String) itr.next();
System.out.println(s.toLowerCase());
}
}
There's no guarantee this method will receive only strings. It is the responsibility of the programmer to remember what goes where. Generics solves this problem by making the typing explicit. Generics documents and enforces rules about what a collection can contain. The
compiler generates an error if the types aren't appropriate. In the rewrite below, notice how Collection and Iterator declare they hold only String objects:
public void lowerCase(
Collection<String> c) {
Iterator<String> itr = c.iterator();
while (itr.hasNext()) {
System.out.println(
itr.next().toLowerCase());
}
}
Now the code contains stronger class typing, but it still involves a lot of keyboard typing. We'll take care of that later. Notice you can store any subtype of the type parameter. Next, we'll use this feature to draw() a collection of shapes.
// Takes collection of child ...
public void drawAll(Collection<Shape> c) {
Iterator<Shape> itr = c.iterator();
while (itr.hasNext()) {
itr.next().draw();
}
}
The value in the angle brackets is known as a type variable. Parameterized types can support any number of
type variables. A java.util.Map, for example, accepts twoone for the key type and one for value type. The following example uses a map with a string-lookup key pointing at a list
of element objects:
public static void main(String[] args) {
HashMap<String, List<Element>> map =
new HashMap<String,
List<Element>>();
map.put("root",
new ArrayList<Element>());
map.put("servlet",
new LinkedList<Element>());
}
The class definition declares how many type variables it accepts. The type parameter count must precisely match what's expected. Also, the type variables must not be primitive types.
List<String, String> // takes one
List<int> // illegal, primitive
You can use a parameterized type even when a raw type is expected. You can do the reverse, too, although doing so gives a compile-time warning:
public static void oldMethod(List list) {
System.out.println(list.size());
}
public static void main(String[] args) {
List<String> words =
new ArrayList<String>();
oldMethod(words); // no problem
}
This allows easy backward compatibility: An old method accepting a raw List can directly accept a parameterized List<String>. A new method accepting
a parameterized List<String> can also accept a raw list, but since a raw list doesn't declare or enforce the same
type restrictions, this action triggers a warning. The guarantee is this: If you don't get an unchecked warning at compile time, the compiler-generated casts won't fail at runtime.
It's interesting to note that parameterized types and raw types are compiled
to be the same type. No special classes enter the picture, and everything is accomplished using compiler trickery. An instanceof check proves this.
words instanceof List // true
words instanceof ArrayList //true
words instanceof ArrayList<String> // true
That raises the question, "If they're the same type, how strong can the checking be?" It's a contract written in ink, not in blood. This code generates a compile error because you can't add a Point to a List<String>:
List<String> list =
new ArrayList<String>();
list.add(new Point()); // compile error
But this code compiles!
List<String> list =
new ArrayList<String>();
((List)list).add(new Point());
It casts the parameterized type to a raw type, which is legal and avoids the type check but generates a warning,
as explained earlier, that calls are
going unchecked:
warning: unchecked call to add(E) as a member of the raw type
java.util.List
((List)list).add(new Point());
^
Writing a Parameterized Type
Tiger provides a new syntax for writing parameterized types. The Holder class shown below holds any reference type. A class like this can be handy, for example, in supporting CORBA's pass by reference semantics without generating separate Holder classes:
public class Holder<A> {
private A value;
Holder(A v) { value = v; }
A get() { return value; }
void set(A v) { value = v; }
}
With a parameterized Holder type, you can get and set data safely without casting:
public static void main(String[] args) {
Holder<String> holder =
new Holder<String>("abc");
String val = holder.get(); // "abc"
holder.set("def");
}
The "A" type parameter name can be any standard variable name. It's usually capitalized and often a single letter. You can also declare that the type parameter must extend another class, as shown below:
// Also possible
public class Holder<C extends Child>
There's still some debate concerning whether you should be able to declare anything else about the type parameter. The deeper you get into generics, the more you need to have special rules, but the more special rules you have, the more complicated generics becomes.
The core class java.lang.ThreadLocal, designed to hold thread local variables, will likely change in Tiger to behave similarly to this Holder class:
public class ThreadLocal<T> {
public T get();
public void set(T value);
}
We'll also see java.lang.Comparable change to allow classes to declare the type against which they can be compared:
public interface Comparable<T> {
int compareTo(T o);
}
public final class String implements Comparable<String> {
int compareTo(String anotherString);
}
Generics aren't just for collections; they have a wide range of uses. For example, although you can't catch based on a parameterized type (because they're no different than raw types), you can throw a parameterized type. In other words, you can dynamically decide what goes into a throws clause.
The following mind-bending code comes from the generics specification. The code defines an Action interface with a type parameter E extending Exception. The Action class has a run() method that throws whatever type comes in as E. The AccessController class then defines a static exec() method that accepts an Action<E> and declares that exec() throws E. The special <E extends Exception> in the method signature is needed to declare that the method itself is parameterized.
Now things get a little tricky.
The main() method calls the AccessController.exec() method
passing in an Action instance (implemented as an anonymous inner class); that inner class is parameterized to throw a FileNotFoundException. The main() method has a catch clause that catches this exception type. Without the parameterized typing, you wouldn't know exactly what run() might throw. With parameterized typing, you can have generic Action implementations that have any run() implementation throwing any Exception type:
interface Action<E extends Exception> {
void run() throws E;
}
class AccessController {
public static <E extends Exception>
void exec(Action<E> action) throws E {
action.run();
}
}
public class Main {
public static void main(String[] args) {
try {
AccessController.exec(
new Action<FileNotFoundException>() {
public void run()
throws FileNotFoundException {
// someFile.delete();
}
});
}
catch (FileNotFoundException f) { }
}
}
Covariant Return Types
Here's a pop quiz: Does the following code compile successfully?
class Fruit implements Cloneable {
Fruit copy() throws
CloneNotSupportedException {
return (Fruit)clone();
}
}
class Apple extends Fruit
implements Cloneable {
Apple copy()
throws CloneNotSupportedException {
return (Apple)clone();
}
}
Answer: The code will not compile in J2SE 1.4, because a method override must have the same method signature, including return type, as the method it's overriding. However, generics has a feature called covariant return types that allows the previous code to compile in Tiger. The feature promises to be extremely useful.
For example, in the latest JDOM code, there's a new Child interface. Child has a detach() method that returns the Child object detached from its parent. In the Child interface, the method, of course, returns Child:
public interface Child {
Child detach();
// etc
}
When the Comment class implements detach(), it always returns a Comment, but without covariant return types, the method declaration must return Child:
public class Comment {
Child detach() {
if (parent != null)
parent.removeContent(this);
return this;
}
}
This means the caller must needlessly downcast the returned type back to a Comment. Covariant return types allows detach() in Comment to return Comment. It works as long as Comment is a subclass of Child. The feature also could come in handy with the Child.getParent() method that right now returns Parent but for DocType could return Document and for EntityRef could return Element. Covariant return types move the responsibility for typing from the user of the class (to acknowledge via casting) to the creator of the class, who understands which types are truly polymorphic with each other. This allows the user of the application programming interface (API) easier use while slightly increasing the burden on the API designer.
Autoboxing
Java has a split-type system with primitives and object (reference) types. Primitives are considered lighter because they have no object overhead. An int[1024], for example, requires just 4K of storage plus one object for the array itself. Reference types, however, can be passed where primitives aren't allowed, such as into a List. The standard workaround to this limitation is to "box" or "wrap" the primitive with its corresponding reference type before an insert such as list.add(new Integer(1)) and unbox on the way out with a return such as ((Integer)list.get(0)).intValue().
The new autoboxing feature lets the compiler implicitly convert from an int to an Integer, a char to a Character, and so on as necessary. Auto-unboxing handles the reverse. In the following example, I count character frequencies in a string without using autoboxing. I construct a Map that should map chars to ints, but because of Java's split-type system, I have to manually manage Character and Integer boxing conversions.
public static void countOld(String s) {
TreeMap m = new TreeMap();
char[] chars = s.toCharArray();
for (int i=0; i < chars.length; i++) {
Character c = new Character(chars[i]);
Integer val = (Integer) m.get(c);
if (val == null)
val = new Integer(1);
else
val = new Integer(val.intValue()+1);
m.put(c, val);
}
System.out.println(m);
}
Autoboxing allows us to write the code as such:
public static void countNew(String s) {
TreeMap<Character, Integer> m =
new TreeMap<Character, Integer>();
char[] chars = s.toCharArray();
for (int i=0; i < chars.length; i++) {
char c = chars[i];
m.put(c, m.get(c) + 1); // unbox
}
System.out.println(m);
}
Here, I've rewritten the map to use generics, and I've let autoboxing give the impression that the map can directly store and retrieve char and int values. Unfortunately, the above code has a problem. What should happen
if m.get(c) returns null? How do you unbox null? In the early access release (see Next Steps), unboxing a null Integer returned 0. Since the early access release, the expert group
has determined that unboxing null should throw a NullPointerException. Thus, the put() line will need to be rewritten as such:
m.put(c, Collections.getWithDefault(
m, c) + 1);
The new Collections.getWithDefault() method implements a get() where, if the value is null, it returns the expected type's default value. For an int type, that's 0.
Although autoboxing can make for more elegant code, my advice is to use it sparingly. The boxing conversions still happen and still create a pile of wrapper-object instances. When speed counts, it's better to stick with the old trick of wrapping an int with an int array of length 1. Then you can store the array wherever a reference type
is required and access the value as intarr[0] and increment using intarr[0]++. You don't even have to
call put() again, because the increment happens in place. Using this trick and
a few others allows you to implement the count much more efficiently. With the following algorithm, execution times against 1 million characters
drops from 650ms to 30ms:
public static void countFast(String s) {
int[] counts = new int[256];
char[] chars = s.toCharArray();
for (int i=0; i < chars.length; i++) {
int c = (int) chars[i];
counts[c]++; // no object creation
}
for (int i = 0; i < 256; i++) {
if (counts[i] > 0) {
System.out.println((char)i + ":"
+ counts[i]);
}
}
}
In C#, we see a similar but slightly different approach. C# has a unified type system where both value types and reference types extend System.Object. You don't directly see this, though, because C# provides aliases and optimizations for the simple value types. An int is an alias for System.Int32, a short is an alias for System.Int16, and a double is an alias for System.Double. In C#, you can call "int i = 5; i.ToString();" and it's perfectly legal. This is because each value type has a corresponding hidden reference type, created when it's cast to a reference type.
int x = 9;
object o = x; // reference type created
int y = (int) o;
While based on a different type system, the end result closely matches what we'll see in J2SE 1.5.
Part 2: Big Changes Coming for Java