An IMS Application Example Based on SIP Servlets and VoiceXML
Pages: 1, 2, 3, 4

Figure 7 illustrates the flow encapsulated by the CallerToVMS class.

Figure 7
Figure 7. The flow encapsulated by CallerToVMS

The caller invites the callee (1) with the application server behaving as a B2BUA. The AssistantSipServlet creates an AssistedCall object using the initial INVITE details, and then begins a B2BUA call to the MRF. A provisional 100 Trying response is sent to the caller (2). The SDP information from the initial INVITE (offer1) is copied to the INVITE request that is sent to the MRF (3), but in this case, the SIP request URI is set to indicate a resource that lies within the Personal Assistant application ( assistant?Action=greetCaller).

Once a 200 OK (4) response is received, the SDP information (answer1) is copied to another 200 OK (5) response that is then sent back to the caller so both parties now have the media session information necessary to communicate directly. An ACK message is sent from the caller's UA to the AS (6) and then from the AS to the MRF (7), completing the three-way handshake on both call-legs. At this point, the AssistedCall is informed (via the callStarted() method on the ITPCCEventListener interface) that the call-leg pair encapsulated by CallerToVMS has been successfully established.

Figure 8 illustrates the HTTP flow between the application server and MRF.

Figure 8
Figure 8. HTTP flow encapsulated by CallerToVMS

The MRF fetches the VoiceXML from the AssistantHttpServlet (9, 10), interprets it, and allows the caller to record their name(11). The recorded audio bytes of the caller's name are returned to the AssistantHttpServlet in an HTTP POST operation (12), where they are stored by the application so the audio can be replayed to the callee later on. The HTTP servlet then returns VoiceXML document to the MRF (12) which keeps the caller waiting by playing hold music until the AssistedCall is ready to interrupt the call. At this point the HTTP servlet uses the AssistedService to find the associated AssistedCall object, and instruct it that it can now begin to contact the callee, because the caller is successfully waiting. This is done by executing the AssistedCall.locateCallee() method.

Figure 9 illustrates the flow encapsulated by the CalleeToVMS class.

Figure 9
Figure 9. The flow encapsulated by CalleeToVMS

This flow uses a classic third-party call control pattern to place an outbound call to the callee. Figure 10 illustrates the HTTP flow between the application server and MRF.

Figure 10
Figure 10. HTTP flow encapsulated by CalleeToVMS

The MRF fetches the VoiceXML ( assistant?Action=greetCallee) from the AssistantHttpServlet (9, 10), interprets this and also fetches the audio bytes for the recording of the caller's name (11,12). This is played to the callee and a DTMF choice allows the callee to choose whether to accept the call or not (13). The choice is then submitted back to the AssistantHttpServlet via HTTP (14), which will respond with either a VoiceXML document that plays hold music (16), or an empty dialog which will simply disconnect the callee's call. If the callee does not wish to speak with the caller, the CallerValedictionTransfer object is invoked to play a valediction message to the caller (this flow is the same as Figure 9). Otherwise, the AssistedCall.transferCallerToCallee() method is invoked (which in turn uses the CallerCalleeTransfer object described below) to connect the caller to the callee.

Figure 11 illustrates the flow encapsulated by the CallerCalleeTransfer class.

Figure 11
Figure 11. The flow encapsulated by CallerCalleeTransfer

When the callee chooses to be connected to the caller, the AssistedCall terminates the call-legs to the MRF by sending BYE requests to the appropriate SIP sessions (1,2,3,4). An empty reINVITE is created using the original caller session so that it will replace the existing call (5). Once the 200 OK response has been received containing the SDP offer (6), the details are copied into a reINVITE request (7) which is sent to the callee. The callee responds with a 200 OK response (8) with its answer SDP, which is copied into the ACK message (9) to the caller. An ACK message (10) is returned to the callee. The caller and callee can now directly communicate.


In this article, we introduced the IMS architecture model and, in particular, how to write applications for it based on Java SIP servlet and VoiceXML technologies. We provided a complete application example to demonstrate the general approach. Since the VoiceXML language readily supports speech recognition and speech synthesis and can also be used for video interactive applications, it is easy to build upon the framework described in this article to create very advanced interactive applications. The Voxpilot MRF provides complete VoiceXML 2.1 functionality, support for all leading speech recognizers and speech synthesizers, and the ability to run video interactive applications, and it runs on standard Intel servers configured on either Windows or Linux operating systems. By incorporating a SIP media gateway (from vendors such as Cisco, AudioCodes, Dilithium, and Radvision), one can immediately deploy applications today on the Public Switched Telephone Network (PSTN) as well as on 3G networks.

The combination of Java SIP servlets and VoiceXML's Web programming model makes for an exceptionally powerful service delivery paradigm suitable both for IMS networks of tomorrow and for the networks of today. Happy developing!


David Burke is Chief Technology Officer at Voxpilot where he is responsible for leading and implementing the overall technical direction and vision. Dave is both an editor and contributor to W3C and IETF speech standards and a regular speaker at international speech conferences.

Darragh O'Flanagan joined Voxpilot Ltd. in 2006 as a Senior Software Engineer, and has worked over the last six years on a wide range of software products in the telecoms industry, both fixed and mobile.