Futurama: Using Java Technology to Build Robots That Can See, Hear Speak, and Move

   
   
Articles Index

Robotics is playing an ever greater role in our world -- from industrial assembly 'droids performing the most mundane of tasks, to search-and-rescue robots saving lives in the depths of collapsed buildings, to interplanetary exploration robots probing the vastness of space. Such robots increasingly take on tasks that are too boring, costly, or dangerous for human beings to perform.

But to develop robotic systems that are efficient and cost effective, we need a technology infrastructure that is both robust and future-proof. In a world of increasingly diverse devices and operating systems, with wildly differing hardware capacities and designs, it's essential that a robotics infrastructure be device-neutral and platform-neutral. After all, in the world of space exploration, the contractor for a given system can change at the stroke of a politician's pen.

When early explorers of the moon experienced a breakdown in their equipment, their technical manuals reportedly advised them -- should all other diagnostic techniques fail -- to "kick with lunar boot." But for a robotic Mars rover sitting on an alien planet 141 million miles from Earth, there's no one there to do the kicking. A red planet robot in search of alien life can't afford the blue screen of death.

Enter Java Technology

The ability to hear, see, speak, and move is key to the human ability to manipulate the world. But to perform such human-like tasks, robotics systems must be able to emulate these functions via software. And when it comes to enabling robust, device-neutral, platform-neutral systems, Java technology virtually wrote the book. Java technology offers an array of APIs tailor-made to the needs of the robotics realm. The Java Speech API lets you build command-and-control recognizers, dictation systems, and speech synthesizers. And the Java Media Framework can be used to receive and process visual images.

Simon Ritter is a Technology Evangelist at Sun Microsystems. He speaks on many different aspects of Java technology, but is a particular champion of Java technology within the world of robotics. He has developed a Robotics Software Development Kit, and regularly gives compelling demonstrations of robotics systems using Java technology. Among these demonstrations are small, resource-constrained robots that use the LEGO Mindstorms RCX "brick." Ritter provides detailed directions (both hardware and software) for building, programming, and deploying his various LEGO robots.

For Ritter, the beauty of the standards-based Java technology APIs related to robotics is that they simplify development work -- allowing you to focus solely on the task at hand, rather than having to understand the entirety of a particular robotics system.

And lest anyone think that Java technology-powered robotics is unavailable outside the domain of enterprise developers, Ritter emphasizes that robotics speech synthesis engines are available for approximately $30, webcams can be had for well under $100, and the LEGO Mindstorms robot can be purchased for approximately $200!

Speech Recognition

The Java Speech API (JSAPI) provides a simple yet powerful interface into speech systems -- including speech recognition and speech synthesis. "The Java Speech API doesn't actually provide the speech recognition and speech synthesis systems," explains Ritter. "It provides you with the programmatic interface into them."

There are several low-cost facilities that use the Java Speech API. "The applications I use for my demonstrations employ the IBM ViaVoice engine," notes Ritter. "It's been around for a few years now, and IBM has been very good to implement the Java Speech API -- which you can download from their AlphaWorks site for free." And the ViaVoice engine is very reasonably priced. For under $30, developers can be up and ready to build speech-enabled Java applications.

Key to JSAPI's speech recognition functionality is the command-and-control recognizer. "You specify certain words you want the system to recognize," says Ritter, "and indicate that you want to have an event sent to you when those words are recognized." The Java Speech API's Grammar Format allows you to define words that the system should recognize. "It provides us with a cross-platform way of controlling the speech recognizer," adds Ritter.


Sample grammar for the Java speech API
                                             
grammar robot;

public <stop> = stop {stop};
public <left> = [Turn] left {left};
public <right> = [Turn] right {right};
public <forward> = [Go] Forward {forward};
public <reverse> = [Go] Back {back};
public <bye> = ([Good] Bye) | So long {bye};

"With this code, I'm defining a set of words I want the system to recognize," explains Ritter, "and then the events I want to get from that particular recognition. I can use the optional features to make it more sophisticated -- for example, I can say: 'bye,' 'good bye,' or 'so long.' All those words will be recognized, and the same event will be returned to my program."

The Java Speech API can also be used for dictation systems, which offer, in essence, a superset of command-and-control functionality. Rather than looking for specific words/commands, such a system will process every word that is spoken, for use in word processors and web browsers.

The code necessary to programmatically make use of a given JSAPI grammar is relatively straightforward. While abbreviated here, more complete code listings are available in Ritter's robotics SDK (see links at the end of this article).


Sample code to create a robotics voice listener using a grammar format file
                                             
import javax.speech.*;
import javax.speech.recognition.*;

// Create recognizer and allocate resources
Recognizer recognizer = Central.createRecognizer(null);
recognizer.allocate();

/* Add engine listener - to notify when speech engine
* is stopped, started, etc.
*/
recognizer.addEngineListener(engineListener);

// Read-in grammar file
File gf = new FileReader(grammarFile);
RuleGrammar rules = recognizer.loadJSGF(gf);

/* Add result listener. VoiceListener then called when a
*particular grammar word is recognized. The associated
*event tag is passed to it
*/
rules.addResultListener(new VoiceListener());

// Tell recognizer to commit changes
recognizer.commitChanges();
// Request focus of microphone away for other apps.
recognizer.requestFocus();
// Start listening
recognizer.resume();

// Remember to catch exceptions

Once a JSAPI result listener is defined, any recognized words will create an event that causes the defined result listener method ( VoiceListener) to be called. The method can then access the passed event tag, and determine the recognized command.


Sample code to receive and process a speech command
                                             
import javax.speech.*;
import javax.speech.recognition.*;

// Called when a given grammar command is recognized
public class VoiceListener
extends ResultAdaptor {
public void resultAccepted(ResultEvent re) {
FinalRuleResult result = (FinalRuleResult)re.getSource();
// Get and examine grammar tag
String[] tags = result.getTags();
System.out.println("First tag was :" + tags[0]);
}
}

Speech Synthesis

On the flip side of the voice functionality coin lies speech synthesis. Here, the Java Speech API Markup Language enhances the functionality and sophistication of spoken text. "We can select a particular voice," says Ritter, "depending upon the underlying system supporting the API. We can also add context information in terms of how to pronounce things. In the case of saying 'JXML,' I want the system to actually pronounce that as four letters."

By some estimates, only 10% of the information conveyed in a verbal exchange comes from the actual words being spoken. So in order to make synthesized speech as subtle and information-rich as that of human beings, it's important to include elements of pitch, volume, and range -- prosody values -- as well as emphasis. "A lot of information comes from pitch, value, and range," says Ritter. "If we want the computer to sound more realistic, we need to be able to emulate those characteristics."

Sample Java Speech API Markup Language

Context Information:
<sayas class="literal">JSML</sayas>

Emphasis:
Java technology is <emphasis>cool</emphasis>

Prosody (pitch, volume, range):
This car, <prosody volume="-20%">but not this one</prosody>, is new

The code necessary to actually effect speech synthesis within a Java language program is relatively straightforward, as shown below.


Sample Java Speech API speech synthesis code
                                             
import javax.speech.*;
import javax.speech.synthesis.*;

public void say(String words) {
// Create synthesizer
Synthesizer s = Central.createSynthesizer(null);
// Allocate resources
s.allocate();
// Speak string
s.speak(words, null);

Note: The latest version of the Java Speech API, including enhanced functionality, is currently going through the Java Community Process -- Java Specification Request 113 (JSR-113).

The Vision Thing

For human beings, our most crucial ability for navigating the physical world is vision. In fact, by some estimates, half the human brain is required to support visual processing! So naturally this is a key facility for many robotic systems. But here again, the hardware for implementing such functionality is surprisingly affordable. "USB webcams are really quite inexpensive now," says Ritter.

There are two basic techniques for accessing visual data within Java technology-based robotic systems. A quick and easy technique uses a Java technology/TWAIN interface. "TWAIN began as an interface standard for connecting scanners to PCs," explains Ritter, "but it's been extended to work with things like webcams. A group out of Slovakia has now created a Java/TWAIN interface, and if you are a non-commercial organization, you can download it for free. That allows you to take an image from a TWAIN device and create a Java AWT image from it -- which you can then display and manipulate."

But while simple to use, the TWAIN interface presents an on-screen user interface, leaving many parameters of the image inaccessible programmatically. Plus, TWAIN is primarily designed for getting single images. "When I was using TWAIN for a demonstration," says Ritter, "I would get about one frame, every one-and-a-half seconds. That's a bit on the slow side if you're trying to do real-time processing."


Sample code for accessing webcam images using a Java technology/TWAIN interface
                                             
// Create TWAIN object
Twain t = new Twain();

/* Disable on-screen TWAIN UI (this stops some cameras from
* working)
*/
t.setVisible(false);

// Configure to get 20 images
t.setTransferCount(20);

// Get AWT image
Image i = Toolkit.getDefaultTookit().createImage(new TwainImage(t));

A more feature-rich and programmatically accessible means of controlling a webcam is via the Java Media Framework (JMF). Originally, the JMF was designed to enable the playback of various types of media information in a platform-neutral way. But it has since been expanded to include playback, capture, transmission, transcoding, and plug-ins for a variety of codecs (compressor/decompressor). While programmatically more involved, with the JMF you can achieve much greater control over the image -- including size, contrast, color depth, and more.

The first step for using JMF to get images is to locate the specific webcam device, using CaptureDeviceManager. Next, you must get a data source through the MediaLocator. Then, set the format of the desired images (device-specific). "With JMF, we have much greater control," says Ritter. "We can decide that we want to have a 160x120 pixel image, with a given color depth, with a given contrast, and so forth."

The next step is to create and realize a Processor, which is the JMF entity that actually feeds the image to the program. After that, a PushBuffer data source is created. Once this is accomplished, you can generate a PushBufferStream through the PushBufferDataSource, and read buffers that represent actual frames from the camera. Those images can then be converted to AWT images using a BufferToImage method.


Sample code for obtaining webcam images using JMF
                                             
import javax.media.*;
import javax.media.control.*;
import javax.media.util.*;

// Get the CaptureDevice that matches the specific camera
CaptureDeviceInfo dev = CaptureDeviceManager.getDevice("vfw:USB Video
Camera:0");

Format[] cfmts = dev.getFormats();

RGBFormat fmt = null;

for (int i = 0; i < cfmts.length; i++) {
// Find format with desired size, bits/pixel, etc.
fmt = (RGBFormat)cfmts[i];
}

// Get MediaLocator
MediaLocator loc = dev.getLocator();

//Get DataSource (via MediaLocator) - source of image data
DataSource src = Manager.createDataSource(loc);

//Get FormatControls for our data source
FormatControl[] fmtc = ((CaptureDevice)src).getFormatControls();

/* Loop through the available FormatControls and try
* to set the format to the one we selected above.
*/
for (int i = 0; i < fmtc.length; i++) {
if (fmtc[i].setFormat(fmt) != null)
break;
}

// Create Processor - for processing rather than display
Processor p = Manager.createProcessor(src);

// Realize Processor
p.realize();

// Must wait here for RealizeCompleteEvent

//Start Processor
p.start();

// Get access to push buffer data source
PushBufferDataSource pbSrc = (PushBufferDataSource)p.getDataOutput();

/* Can now retrieve the PushBufferStream that will enable us
* to access the data from the camera
*/
PushBufferStream[] strms = pbSrc.getStreams();

/* Should test format - in terms of previously selected
* parameters
*/
camStream = strms[0];
RGBFormat rgbf = (RGBFormat)camStream.getFormat();

// Set up for conversion below
BufferToImage conv = new BufferToImage(rgbf);

// Grab image from webcam
Buffer b = new Buffer();
camStream.read(b);

// Convert to an AWT image
Image i = conv.createImage(b);

You can then convert the image to an RGB bitmap using the PixelGrabber class. And the bitmap can be processed and analyzed (in terms of color and edge detection). The Java 2D API is ideal for more simple edge detection, while the Java Advanced Imaging API is appropriate for more complex edge detection.

For those planning to implement webcam functionality, it's important to recognize the distinction between the push buffer data source provided by JMF and a USB camera, and the pull buffer data source required in order to receive and process real-time images. "A push buffer date source will start delivering images from the camera as soon as you create it," says Ritter. "So you'll get the first image it has in its buffer. Therefore, if we start a camera four seconds before we need an image, we'll get the image from four seconds ago." What's required for true real-time image gathering is a pull buffer data source. "With the way a USB camera works," continues Ritter, "you can't get a pull buffer data source. So you have to write a separate thread that continually gets images from the camera. Then when you want one at a particular time, you can say -- 'give me the last one' from that thread."

Java Technology Powered Mobile Robots -- the LEGO Mindstorms Robotics Invention System

Figure 1: LEGO RCX "Brick

For all those who might assume that mobile robotics is not something the average Java programmer can readily explore, you have to experience the LEGO Mindstorms Robotics Invention System. "LEGO came up with the Mindstorms system about three years ago," says Ritter. "It was developed in conjunction with MIT."

At the heart of the system is the programmable RCX "brick," a small computer contained within a yellow LEGO brick. The brick consists of a Hitachi 8-bit processor (16 MHz), 16 Kb of ROM, 32 Kb of RAM, 3 sensor inputs, 3 motor outputs, a 5-character LCD display, and an infrared serial data communications port. The brick is a small and extremely resource-constrained computing device -- particularly by today's desktop standards of GHz processors and hundreds of Mb of memory. "On the other hand," notes Ritter with amusement, "I realized that it's just as powerful as my first personal computer."

The three outputs of the brick can be connected to motors and other devices, and the three inputs can be connected to such varied sensor devices as light, touch, rotation, and even heat. The system can process over 1000 commands a second, and features a fully multitasking operating system (allowing up to ten simultaneous tasks).

The brick was initially designed by LEGO to be programmed via a PC-based system that allows the visual assembly of on-screen functional components. This component-driven system then generates a completed program that can be downloaded into the brick.

But what opened the brick up to whole new vistas of innovation and functionality was the development of the open source leJOS environment. The creators of leJOS have managed to squeeze an actual Java Runtime Environment (including multi-threading) into 14 KB on the brick. But leJOS is obviously not a complete implementation of the Java platform. "You can't really expect that," says Ritter. "Having the AWT on the brick doesn't make much sense." Due to memory constraints, leJOS also lacks garbage collection, but the hooks are there for future implementations.

With leJOS, Java developers now have an inexpensive (yet multi-threaded) robotics platform available to them. The basic kit starts at only about $200.

In order to accommodate future versions of the brick, the system was designed to allow for the easy loading of new LEGO firmware. But that also makes it very simple to replace the firmware with the leJOS environment. The first step toward enabling the brick to run Java programs is to load leJOS.

lejosfirmdl

"That's about 14 KB," says Ritter, "and is effectively your Java runtime environment."

The next step in running a program on the (now) Java technology-enabled brick, is to create and compile a Java program.

lejosc MyClass.java

lejos -o MyClass.bin MyClass

Finally, the binary file is loaded (via infrared serial port) onto the brick.

lejosrun MyClass.bin

To actually execute the program, you simply push the brick's "start" button. But with a five-character LCD display as the only output device, debugging programs can sometimes be challenging!

There are currently three versions of the RCX brick. Versions 1.0 and 1.5 are essentially the same, and employ an infrared serial communications link to connect the PC to the brick (2400-baud). With the leJOS environment, the Java Communications API can be used for communicating with the brick. "Somebody's even written a web server that runs on the brick!" says Ritter.

The RCX version 2.0 uses a USB communications port. And while leJOS can be installed onto the system, there is no current support for the Java USB API (JSR-080). LeJOS now supports the USB connection on both Windows and Linux and includes a communications API that provides the same functionality as the JavaComm API.

Innovation on Parade


Figure 2: UC Santa Cruz's Music Playing SlugBot
 
Figure 3: UC Berkeley's Paper Sorting PaperBot
Figure 4: Stanford's Maze Solving MazeBot

Robots, Ante Up!

Ritter demonstrated several of his LEGO robots at the 2002 SunNetwork Conference in San Francisco. One of them, a LEGO brick on wheels, was programmed to detect and follow an irregular black trail on a field of white. "One thread detects events," says Ritter, "and the other thread controls the motors and figure out the directions."

Another of Ritter's "LEGO-bots" acts as a command-driven/speech synthesizer-enabled robot blackjack dealer. The system recognizes commands, vocalizes activities, and visually interprets dealt hands. "Ready to play," the robot announces. "Dealing cards for new hand. Player gets six of spades," it says.

 

Figure 5: Simon Ritter's Demonstration LEGO Robot


And the same is true of the blackjack dealer robot. While the speech synthesis and visual processing is being handled on the PC, the brick is handling the movement of all the motors, figuring out where the card is in the machine, and so forth.

And such systems really do seem to have a life of their own at times. While detailing a technical aside during the SunNetwork Conference demonstration, Ritter forgot that the dealer-bot was still in command mode. Assuming it was being spoken to, the robot responded -- "Sorry, you must ask me to deal a hand of cards first."

Ritter's code for many of his demonstration robots, as well as his robotics SDK, are available online (see links at end of article).

The Future

The world of robotics dramatically demonstrates the cross-platform power and flexibility of Java technology -- from the full-featured Java Speech and Java Media Framework APIs, down to the extremely resource constrained, but highly innovative and open source leJOS environment. "The Java Speech API is a very nice, low-cost way of adding voice control, voice recognition, and voice synthesis to your applications," says Ritter. "And the Java Media Framework API is a very useful and very powerful set of technologies that allow you to get information from a webcam and do processing on it. And then you have leJOS, which is a great open source project offering the power and simplicity of Java technology to control robots."

When it comes to Java technology and robotics, seemingly even the sky is no longer the limit. In 2001, LEGO, Siemens, Hitachi, and Intospace ran a competition to create a LEGO robot. The winning system was then taken up into the International Space Station (December, 2001). "It floated around collecting particles of debris," says Ritter. "So it was effectively a garbage collector, but using a system that didn't have a garbage collector!"

And most recently, Sun Labs' James Gosling has been consulting with scientists and engineers at NASA and JPL, with the aim of incorporating Java technology into upcoming Mars rover missions. The red planet is over twenty light-minutes from Earth. And as a result, even at the speed of light (and radio waves), it takes over twenty minutes to get a message to a planetary rover, and then another twenty minutes to confirm the receipt (and implementation) of that message. Such distances don't make for effective real-time control of a vehicle, so such planetary exploration robots must make many decisions on their own. The next generation of the software for the Mars rover was originally designed and built using C++, but Gosling is working with NASA and JPL to create a Java version of the code that will be sent with the rover on the 2009 Mars mission.

Note: Special thanks to Simon Ritter for his assistance with this article.

See Also

Java Speech API Home Page
Java Speech Grammar Format Specification
Java Speech API Markup Language Specification
Java Speech API JSR-113
Java Communications API
Java Media Framework API 2.1.1
Java/TWAIN Interface
Java 2D API Home Page
Java Advanced Imaging API Home Page
IBM's Speech for Java (Implementation of the Java Speech API)
IBM ViaVoice
Java/TWAIN Interface
LEGO Mindstorms Home Page
leJOS Home Page
leJOS Programming Books

LEGO Mindstorms Books:

The Unofficial Guide to LEGO MINDSTORMS Robots
LEGO MINDSTORMS Programming: Unleash the Power of the Java Platform
Programming Lego Mindstorms with Java
Machine Vision Algorithms in Java: Techniques and Implementation