Figure 7 illustrates the flow encapsulated by the
Figure 7. The flow encapsulated by
The caller invites the callee (1) with the application server behaving as a B2BUA. The
AssistantSipServlet creates an
AssistedCall object using the initial
INVITE details, and then begins a B2BUA call to the MRF. A provisional
100 Trying response is sent to the caller (2). The SDP information from the initial
INVITE (offer1) is copied to the
INVITE request that is sent to the MRF (3), but in this case, the SIP request URI is set to indicate a resource that lies within the Personal Assistant application (
200 OK (4) response is received, the SDP information (answer1) is copied to another
200 OK (5) response that is then sent back to the caller so both parties now have the media session information necessary to communicate directly. An
ACK message is sent from the caller's UA to the AS (6) and then from the AS to the MRF (7), completing the three-way handshake on both call-legs. At this point, the
AssistedCall is informed (via the
callStarted() method on the
ITPCCEventListener interface) that the call-leg pair encapsulated by
CallerToVMS has been successfully established.
Figure 8 illustrates the HTTP flow between the application server and MRF.
Figure 8. HTTP flow encapsulated by
The MRF fetches the VoiceXML from the
AssistantHttpServlet (9, 10), interprets it, and allows the caller to record their name(11). The recorded audio bytes of the caller's name are returned to the
AssistantHttpServlet in an HTTP POST operation (12), where they are stored by the application so the audio can be replayed to the callee later on. The HTTP servlet then returns VoiceXML document to the MRF (12) which keeps the caller waiting by playing hold music until the
AssistedCall is ready to interrupt the call. At this point the HTTP servlet uses the
AssistedService to find the associated
AssistedCall object, and instruct it that it can now begin to contact the callee, because the caller is successfully waiting. This is done by executing the
Figure 9 illustrates the flow encapsulated by the
Figure 9. The flow encapsulated by
This flow uses a classic third-party call control pattern to place an outbound call to the callee. Figure 10 illustrates the HTTP flow between the application server and MRF.
Figure 10. HTTP flow encapsulated by
The MRF fetches the VoiceXML (
assistant?Action=greetCallee) from the
AssistantHttpServlet (9, 10), interprets this and also fetches the audio bytes for the recording of the caller's name (11,12). This is played to the callee and a DTMF choice allows the callee to choose whether to accept the call or not (13). The choice is then submitted back to the
AssistantHttpServlet via HTTP (14), which will respond with either a VoiceXML document that plays hold music (16), or an empty dialog which will simply disconnect the callee's call. If the callee does not wish to speak with the caller, the
CallerValedictionTransfer object is invoked to play a valediction message to the caller (this flow is the same as Figure 9). Otherwise, the
AssistedCall.transferCallerToCallee() method is invoked (which in turn uses the
CallerCalleeTransfer object described below) to connect the caller to the callee.
Figure 11 illustrates the flow encapsulated by the
Figure 11. The flow encapsulated by
When the callee chooses to be connected to the caller, the
AssistedCall terminates the call-legs to the MRF by sending BYE requests to the appropriate SIP sessions (1,2,3,4). An empty
reINVITE is created using the original caller session so that it will replace the existing call (5). Once the
200 OK response has been received containing the SDP offer (6), the details are copied into a
reINVITE request (7) which is sent to the callee. The callee responds with a
200 OK response (8) with its answer SDP, which is copied into the
ACK message (9) to the caller. An ACK message (10) is returned to the callee. The caller and callee can now directly communicate.
In this article, we introduced the IMS architecture model and, in particular, how to write applications for it based on Java SIP servlet and VoiceXML technologies. We provided a complete application example to demonstrate the general approach. Since the VoiceXML language readily supports speech recognition and speech synthesis and can also be used for video interactive applications, it is easy to build upon the framework described in this article to create very advanced interactive applications. The Voxpilot MRF provides complete VoiceXML 2.1 functionality, support for all leading speech recognizers and speech synthesizers, and the ability to run video interactive applications, and it runs on standard Intel servers configured on either Windows or Linux operating systems. By incorporating a SIP media gateway (from vendors such as Cisco, AudioCodes, Dilithium, and Radvision), one can immediately deploy applications today on the Public Switched Telephone Network (PSTN) as well as on 3G networks.
The combination of Java SIP servlets and VoiceXML's Web programming model makes for an exceptionally powerful service delivery paradigm suitable both for IMS networks of tomorrow and for the networks of today. Happy developing!
David Burke is Chief Technology Officer at Voxpilot where he is responsible for leading and implementing the overall technical direction and vision. Dave is both an editor and contributor to W3C and IETF speech standards and a regular speaker at international speech conferences.
Darragh O'Flanagan joined Voxpilot Ltd. in 2006 as a Senior Software Engineer, and has worked over the last six years on a wide range of software products in the telecoms industry, both fixed and mobile.