Accelerating Multimedia Application Development with JSR 309 Media Server Control API

by Marc Brandt, Tomas Ericson, and Alain Comment

Learn how to build a SIP Servlet conferencing application using the JSR 309 Reference Implementation Driver running in Oracle Communications Converged Application Server and accessing the HP OpenCall Media Platform

Published April 2010

Downloads:
 Oracle Communications Converged Application Server
 Sample conferencing application

Over recent years, the architecture for deploying rich multimedia communications applications has evolved to allow distribution between the application logic and the media processing. This is well illustrated by the evolution of 3GPP IMS MRF deployment architecture along with the IETF SIP and Media Server Control protocol efforts that support a model of decomposition between Application Servers and Media Servers.

As briefly explained here, emerging open standards enable interoperable deployment of such AS-MS architecture. Both Oracle and HP have committed to driving Open Standards and their adoption. A key enabler for the success of such deployment model is to facilitate the programming of applications deployed on top.

The JSR 309 API provides the flexible programming model compatible with the richness of the underlying AS-MS decomposition. It supports the delegation of multimedia capabilities to the media server, like using VoiceXML, as well as allowing fine-grained control of the media server operations over protocols such as RFC 5022 MSCML or IETF Media Server Control protocol (see RFC 5167, 5552, 5567), among others.

Figure 1 illustrates how 3GPP has recently evolved the MRF architecture to support deployment of feature-rich media servers through an open standard media server control interface over Cr Reference Point inside the IP Multimedia Core Network subsystem (introduced in 3GPP TS 23.002 Network architecture in Release 8)

Excerpt from 3GPP TS 23.218 introducing the new Cr reference point with emerging standards mapped on top
Figure 1 Excerpt from 3GPP TS 23.218 introducing the new Cr reference point with emerging standards mapped on top


HP has been very active in driving these evolutions in 3GPP, as TR 24.880 rapporteur, and has been contributing to distributed media server control standards as co-editor in IETF MediaCtrl and co-chair in W3C Voice Browser Working Group.

JSR 309 Standardization: Controlling Media Server Resources Through Java

Oracle, spec lead of JSR 289 SIP Servlet 1.1 API, and HP, recognizing the diversity of Media Server deployment models and control protocols, joined forces in 2007 to drive the JSR 309 API for the Java Platform resulting in a programming model, which transparently to the application developer supports a wide range of  protocols and delegation models.

Figure 2 illustrates the positioning of this API in the standardization landscape in a typical distributed open standards-based deployment of Application and Media servers.


Figure 2 Positioning the open API and Protocol standards efforts in the Media Server Control landscape


The main benefit of JSR 309 is to provide a Java API independent of underlying media server control protocols. Media server specifics are handled by a JSR 309 Driver, similar to how JDBC are abstracting away database specifics, allowing an application developer to transparently program using the JSR 309 API, regardless of media server vendor. Operators and Service Providers can easily use different media servers, without having to re-write the applications.

JSR 309 gives flexibility for supporting existing and emerging standard protocols inside driver implementations, while simplifying application programming and guaranteeing portability across AS-MS deployment models.

The JSR 309 Reference Implementation provided by HP is a driver implemented using SIP Servlet technology and widely deployed protocol IETF MSCML. IETF Mediactrl is also illustrated in the above picture and positioned as a future open standard Protocol for AS-MS interoperability.

JSR 309 Overview

JSR 309 Media Server Control API 1.0 defines a very flexible object model for controlling Media Server resources and the topology of Media streams through the join operation.

JSR 309 targets applications such as:

  • Interactive Voice Response (IVR) using MediaGroup, Player, Record, SignalDetector objects: prompt and record voice or video, prompt and collect DTMF, interaction with VoiceXML servers, VoiceMail servers
  • Conferencing using MediaMixer object: audio, video, layout
  • Combination of those and other JSR 309 core objects capabilities: Multimedia IVR, Ring Back, Video Monitoring, Call Center/Contact Center

Figure 3 illustrates some of the interfaces of the object model, which are used throughout the sample Conferencing application.

Core Objects of JSR 309 typically used in the Conferencing Application
Figure 3 Core Objects of JSR 309 typically used in the Conferencing Application


JSR 309 is designed to fit the decomposed Application Server / Media Server deployment model, where:

  • Application connects and handle multimedia legs among end points through SIP signaling
  • Media Server, controlled by the application through JSR 309 API, processes the associated media streams described by SDP properties

Figure 4 illustrates the JSR 309 deployment architecture used for the Conferencing Application illustrated in this article.

Typical JSR 309 deployment architecture
Figure 4. Typical JSR 309 deployment architecture


The conferencing application deployed inside Oracle Communications Converged Application Server OCCAS is using HP OpenCall JSR 309 Reference Implementation and HP OpenCall Media Platform (OCMP).

Conferencing Application Description and Setup

The Conferencing application illustrates the power and flexibility of the JSR 309 and SIP Servlet APIs in a distributed environment. Although a commercial application would normally make use of many more features of the JSR 309 API, this sample application is a functional illustration of the call control logic and media processing needed to connect and mix multiple participants in a conference.

In order to deploy and test the Conferencing application, see Deploying and Running the Conferencing Application.

Design Overview

The following figure describes the main components used in the application.

High level Conferencing Application architecture
Figure 5. High level Conferencing Application architecture


A user will typically connect to the conferencing application using a SIP phone. He/she will be prompted to enter a conference ID number and proposed to record his/her name. The recorded name will be played as an announcement and also made available for later retrieval through a web interface. To join a conference, a password is required. It can be set up by the first participant for practical reasons of this sample use case. During a conference, a participant can mute or unmute the line. Application interactions such as muting a participant are done either through DTMF from the SIP phone or through the Web interface. This illustrates how the application can control the same media capabilities from different sources of inputs.

Conferencing Application Code Walk-through

This section illustrates the use of JSR 309 through some highlights of the application flow and code. For detailed information on the SIP Servlet API and the JSR 309 Media Server Control API, please refer to the reference section.

The conferencing application is a so-called converged application, combining HTTP Servlets and SIP Servlets in a single web archive (war file). The application is controlling media resources by invoking the JSR 309 API. The SIP Servlet code is responsible for handling call control with the end points (conference participants). There is a simple web interface implemented  using HTTP Servlets and JSPs, enabling users to monitor and control ongoing conferences, showing the convergence of SIP, HTTP and Media Control capabilities in the same Application Environment.

The following figure is a high level class diagram illustrating the various classes of the application.

High level class diagram
Figure 6. High level class diagram [Click for enlarge]


ConferenceServlet Overview

The SIP Servlet ConferenceServlet handles the SIP signaling between the terminals of conference participants and the application server, answering incoming SIP calls etc.

                               
// Common factory for JSR 309 objects used by all service classes
public static MsControlFactory theMsControlFactory;

Upon ConferenceServlet initialization, a MsControlFactory instance is retrieved from HP’s Reference Implementation Driver named "com.hp.opencall". This MsControlFactory instance is used throughout the application for creating JSR 309 objects.

                               
public void init() throws ServletException {
try {…
theMsControlFactory =
DriverManager.getFactory("com.hp.opencall", info);
…} …

When the method getFactory is called, the DriverManager looks up the requested driver and retrieves an MsControlFactory from it. The DriverManager class which is part of the JSR 309 API will load and register driver packages automatically (subject to driver packaging).

The ConferenceServlet also instantiates a ConferenceManager (confManager) managing Participants and ConferecenSessions objects.

Upon receiving an initial SIP INVITE, a new Participant is instantiated, triggering the creation of a NetworkConnection initiating the SDP negotiation. Once the SIP ACK is received, the participant interaction will start, asking the participant to enter the conference ID.  Upon receiving a SIP BYE, the Participant is unjoined fromthe conference.

                               
… doInvite(SipServletRequest req) …{
// Create a new Participant
ConferenceManager.getInstance().addParticipant(
userId, req.getRawContent(),
req.getSession().getApplicationSession().getId());… }

… doAck(SipServletRequest req) … {
// Launch the media service
… participant.start(); … }

doBye(SipServletRequest req) … {
// Terminate the service
… participant.release();
// Send 200 OK to the UA
… }

ConferenceSession Overview

This class models a conference room which holds its own JSR 309 MediaSession, MediaMixer and MediaGroup. The MediaGroup is used to play announcements to the Conference.

                               
        // JSR 309 objects
private final

MediaSession
mMediaSession;
private final

MediaMixer
mMediaMixer;
private final

MediaGroup
mMediaGroup;

Each Conference has a Conf ID and Password, also holds its Participants and the Organizer. See Participant.

                               
       // Conference identifier and participants list
private final String confId;
private List<Participant> mParticipants;
private Participant mOrganizer;
private String mPassword;

When a ConferenceSession is instantiated the JSR 309 objects are created from the MsControlFactory that was obtained through the DriverManager when the ConferenceServlet was initialized.

                               
public ConferenceSession(String confId, Participant aParticipant,

ConferenceServlet aServlet) … {

        mMediaSession = ConferenceServlet.theMsControlFactory.

createMediaSession
();
mMediaMixer = mMediaSession.

createMediaMixer
(MediaMixer.

AUDIO
);
mMediaGroup =
mMediaSession.

createMediaGroup
(MediaGroup.

PLAYER
);
mMediaGroup.

join
(Direction.

SEND
, mMediaMixer);


        }


The first step creates a Media Session calling createMediaSession() on theMsControlFactory. The MediaSession is a container and factory for media objects that will be used by the Conference Application.

For each ConferencenSession initialized, an AUDIO-only MediaMixer and a PLAYER-only MediaGroup are allocated. The MediaGroup is joined to the MediaMixer in SEND mode so that that it can be used to play media into the conference.

For enhancing the application to record a conference, the MediaGroup configuration must be PLAYER_RECORDER_SIGNALDETECTOR, and MediaGroup must be joined in DUPLEX mode. The record() method must then be used with an HTTP URL that enables posting the recorded file.

While Participants join or leave this conference, the ConferenceSession allows playing announcements. For instance playing an announcement to the remaining Participants when someone has left the conference:

  • First, build the reference to the files to play:
                               
   public void removeParticipant(Participant participant) {
…{
String[] fileURLs =
new String[] { participant.getRecordingURI().toString() +
"

&disconnected=true
",
Participant.getPromptDirPath() +
"HasLeftTheConference.wav" };
}; …
playAnnouncement(fileURLs);
… }
  • Then, play the files on the MediaGroup connected to the MediaMixer for this Conference.
                               
public void playAnnouncement(String[] fileURLs) {
… mMediaGroup.

getPlayer
().

play
( streamIDs,
null,
Parameters.NO_PARAMETER);
… }

The Player sends a PLAY_COMPLETED event at the end of the play. The Application may catch this event by registering a MediaEventListener on the Player’s MediaEventNotifier interface.

Participant Overview

This class models a conference participant and implements a state machine handling participant interactions.

Participant State Machine Diagram
Figure 7. Participant State Machine Diagram


Each Participant holds its own JSR 309 MediaSession, NetworkConnection and MediaGroup. The NetworkConnection, with its SdpPortManager, handles the SDP configuration and RTP traffic between the User Agent and the Media Server. The MediaGroup is used for DTMF interactions with the participant like getting the Conf ID or password when a new Participant calls in, as well as playing prompts and recording a participant’s name.

                               
private final

NetworkConnection
mNetworkConnection;
private final

SdpPortManager
mSdpPortManager;
private final

MediaGroup
mMediaGroup;
private final

MediaSession
mMediaSession;

Initialization of the Participant’s NetworkConnection and MediaGroup

A Participant is initialized upon reception of initial SIP INVITE from the ConferenceServlet.

                               
public Participant(final SipServletRequest req) … {…}

The following JSR 309 objects are created:

                               
mMediaSession =   
    ConferenceServlet.theMsControlFactory.

createMediaSession
();
myNetworkConnection =  
    myMediaSession.

createNetworkConnection
(NetworkConnection.

BASIC
);
mySdpPortManager = myNetworkConnection.

getSdpPortManager
();
myMediaGroup = myMediaSession.

createMediaGroup
(
               MediaGroup.

PLAYER_RECORDER_SIGNALDETECTOR
);

A regular NetworkConnection is created using a default configuration. The participant’s MediaGroup that is created will contain Player, Recorder and SignalDetector resources.

In order to be notified of the completion of media operations invoked on the MediaGroup’s resources, event listeners must be registered. They will typically capture completion of play, record or DTMF collection operations:

                               
myMediaGroup.

getPlayer
().

addListener
(new
ConfListener<PlayerEvent>());
mMediaGroup.

getRecorder
().

addListener
(new
ConfListener<RecorderEvent>());
mMediaGroup.

getSignalDetector
().

addListener
(new
ConfListener<SignalDetectorEvent>());

The new NetworkConnection is then joined to the MediaGroup. Note that this is performed before the SDP negotiation is completed. The join operation is DUPLEX in order to allow for playing prompts, recording or detecting DTMF.

                               
mNetworkConnection.

join
(Direction.

DUPLEX
, mMediaGroup);

The SDP offer received from the user agent is provided to the SdpPortManager:

                               
mSdpPortManager.

processSdpOffer
(sdpOffer);
Call flows for SDP negotiation
Figure 8. Call flows for SDP negotiation


In order to handle the SDP answer from the Media Server an Event Listener is registered on the SdpPortManager. Upon receiving a successful event from the SdpPortManager, a 200OK is sent with the SDP answer obtained from the Media Server.

                               
mSdpPortManager.addListener(new
MediaEventListener() {…}
{
If(event.getEventType().equals(

SdpPortManagerEvent.ANSWER_GENERATED
)) {
// The NetworkConnection has been setup properly.
// Send a 200 OK, with negotiated SDP from the Media Server attached.
ConferenceServlet.sendResponse(sipAppSessionId,
SipServletResponse.SC_OK,
"application/sdp", event.getMediaServerSdp());
}

Media Interactions When the Participant is Joining

The interactions between a participant and a conference are handled through a state machine pattern implemented in Participant by using a java enum: private enum State { }. The high level state machine is provided in  Figure 7.

The following figure illustrates the user interactions when joining a conference.

Call flows for a participant joining a conference
Figure 9. Call flows for a participant joining a conference


Once a participant is connected, the first interaction is to obtain a Conf ID through a prompt and collect operation. This is achieved by using the receiveSignal() operation and the SignalDetector.PROMPT capability. Initial and inter-digit timeouts are also positioned, and a maximum of 4 digits are collected with barge-in enabled allowing the PROMPT to be stopped when the user starts entering digits. This is achieved by the following entry action in state EnterConfId:

                               
{ Parameters params = 
             ConferenceServlet.theMsControlFactory.

createParameters
();
params.put(SignalDetector.

INITIAL_TIMEOUT
, 5000);
params.put(SignalDetector.

INTER_SIG_TIMEOUT
, 5000);
URI prompt = URI.create(getPromptDirPath() +
                             "PleaseEnterYour4DigitConferenceID.wav");
params.put(SignalDetector.

PROMPT
, prompt);
part.mMediaGroup.

getSignalDetector
().

receiveSignals
(
                             4, null,
                             new RTC[] { //barge-in
        new RTC(SignalDetector.

DETECTION_OF_ONE_SIGNAL
, Player.

STOP
)},
                             params);
}

The state EnterConfId will receive the corresponding SignalDetectorEvent and process the collected conf ID in order to retrieve an existing ConferenceSession or create a new one.

                               
if (event.getEventType() ==
               SignalDetectorEvent.

RECEIVE_SIGNALS_COMPLETED
) {…}

Note the SignalDetector will similarly be used to request the Participant to enter a conference password in state EnterConfPwd.

In another interaction, the Recorder will be used to record the participant’s name. This is achieved in the state RecordingName. The record() operation is used with a PROMPT parameter and barge-in is enabled by using a Run Time Control (RTC) so that when the user presses ‘#’ the record is interrupted.

                               
{ Parameters params = 
   ConferenceServlet.theMsControlFactory.createParameters();
   URI prompt = URI.create(getPromptDirPath() +    
               "PleaseSayYourNameEndWithHash.wav");
   params.put(Recorder.

PROMPT
, prompt);
   params.put(SignalDetector.

PATTERN[0]
, "#");
   part.mMediaGroup.getRecorder().

record
(
      part.getRecordingURI(),
      new RTC[] { // barge-in
      new RTC(SignalDetector.

DETECTION_OF_ONE_SIGNAL
, Player.

STOP
),
        // stop recording key
      new RTC(SignalDetector.

PATTERN_MATCH[0],
Recorder.

STOP
)},
      params);
}

The state RecordingName will receive the corresponding RecorderEvent and recognize whether the participant has recorded something based on the event type and qualifier.

                               
…if (event.getEventType() == RecorderEvent.RECORD_COMPLETED) {
if (event.getQualifier() == RecorderEvent.SILENCE) {
…}… else if (event.getQualifier() == ResourceEvent.RTC_TRIGGERED
&& event.getRTCTrigger() ==
SignalDetector.PATTERN_MATCH[0]) {…

Once a participant has entered a conference id, recorded his/her name and entered the conf password, he/she will be joined to the conference. This is achieved in two steps. First a welcome prompt is played to the participant. Then the participant is joined to the conference and a welcome prompt announcing the new participant is played to the conference overall. This is achieved in state Welcome:

                               
{… // use the participant's MediaGroup
part.mMediaGroup.getPlayer().

play
(
URI.create(getPromptDirPath() + "PutIntoSession.wav"),
null, Parameters.NO_PARAMETER);

part.mNetworkConnection.

join
(Direction.

DUPLEX
,
confSession.getMediaMixer());

// play an announcement to the conference using conference's MediaGroup
confSession.playAnnouncement(fileURLs);

Media Interactions with the Participant during the conference

Once participants have joined a conference, they can still control some functions like mute / unmute by pressing a key, triggering DTMF. Note that for doing this each participant is both joined to the conference and to their own MediaGroup:

                               
// upon Participant initialization
mNetworkConnection.

join
(Direction.

DUPLEX
, mMediaGroup);

// upon Participant joining the Conference
part.mNetworkConnection.

join
(Direction.

DUPLEX
,
confSession.getMediaMixer());

The result after the two join operations is that the Participant’s media flows duplex with the Conference, but the input coming from the participant is still sent to the participant’s MediaGroup hence allowing detection of DTMF from this single participant. While in Conferencing or ConferenceMuted state, a receiveSignal() operation is always activated on the participant’s MediaGroup, enabling reception of every DTMF triggered by the participant pressing a key.

                               
part.mMediaGroup.getSignalDetector().

receiveSignals
(

1
, null, null, null);

When a participant enters the key ‘6’, he/she will get muted. The following join connects the participant’s NetworkConnection to its MediaGroup, thus unjoining it from the MediaMixer. The MediaGroup is then used to play the mute announcement (only to the participant). Once the announcement is played the participant is joined back to the MediaMixer in RECV mode only and continues to listen to the conference.

                               

private void mute(Participant part) … {
    part.mNetworkConnection.

join
(Direction.

DUPLEX
, part.mMediaGroup);
    part.mMediaGroup.getPlayer().

play
(URI.create(getPromptDirPath() +
                                       "MuteOn.wav"), null,
                                       Parameters.NO_PARAMETER);

part.mNetworkConnection.

join
(Direction.

DUPLEX
,
                 confSession.getMediaMixer());

Note the participant is still connected to its MediaGroup so that DTMF can still be detected by the active receiveSignals() operation.

While in ConferencingMuted state, the participant can unmute using the same SignalDetector of its MediaGroup by pressing the key ‘9’. Upon receiving this event, the Participant is joined to the MediaGroup for playing the unmute announcement and is then joined back to the MediaMixer in DUPLEX mode at the end of the play() operation. The participant is now back in state Conferencing.

                               

part.myNetworkConnection.

join
(Direction.

DUPLEX
, part.myMediaGroup);
part.myMediaGroup.getPlayer().play(URI.create(getPromptDirPath() +  
                     "MuteOff.wav"), null, Parameters.NO_PARAMETER);>

part.myNetworkConnection.

join
(Direction.

DUPLEX
,
                          part.myConferenceSession.getMediaMixer());

 Media Interactions with the Participant as an Organizer

These interactions are shown in the State Diagram. The reader is recommended to look at the source code for the Participant’s states pertaining to the Organizer role, and at the HTTP Servlet in order to get other examples of using JSR 309:

  • Participants put on hold listening music while waitingfor the organizer
  • Organizer sets the conference password
  • Mute / unmute / unjoin participant from the Web interface 

Evolution of Media Server Control API

Although the deployment is based on commercially-ready application server and media server, several aspects are not handled by this sample application with regard to the scalability and availability of a fully distributed solution when deployed in a real network. Most of the capabilities for Application Server or Media Server selection, load balancing, resource brokering, resource reservation and availability management are usually handled through the deployment architecture.

As version 1.0 of Media Server Control API, JSR 309 does not provide additional features to handle these aspects which should be part of an implementation differentiation and remain transparent to the programmer as much as possible. These operational features had though been identified by the JSR 309 Expert Group and based on deployment and evolution of Media Server features may be studied during evolution of this API.

The goal of the API is to support a portable programming model for each specified media capability available on different MS. However it also takes into account that every MS may support different features or not all the capabilities defined in JSR 309 at once. The application can take advantage of the SupportedFeatures interface to detect what certain features a specific JSR 309 Driver implementation support dynamically. This is expected to evolve with new protocols, and real application programming experience will help improve this feature.

JSR 309 1.0 supports a set of capability to handle video conferencing with rich layout rendering. As video-rich and interactive applications are further designed, it is likely new features will be identified as needing a common and standard API evolved from current interface. One example is the support of further capabilities using the media stream type message (e.g. RFC 4975 MSRP messaging sessions) in addition to audio and video.

JSR 309 1.0 Specification is focused on a set of very flexible Core objects to deal with media server capabilities. At the formation of JSR 309 Expert Group, the needs for composite operations had already been identified. Composite operations such Prompt and Record, or Prompt and Collect are already included in the operations of the MediaGroup. Possible additions include muting all participants in a conference at once, handling side-bar conferences.  Although these composite operations can be developed on top of Core Objects there may be value to standardize further widely used composite operations.

Conclusion

In this article, we introduced a Conferencing Application sample based on the new JSR 309 Media Server Control API and Java SIP Servlet technology. We provided a complete application example illustrating the end to end flows between User Agent, Application Server and Media Server. It demonstrates the value and ease of writing applications without the need of knowing the underlying media server control protocol such as MSCML and other protocols available on the market or emerging as new IETF standard.

With about 1500 lines of Java code all of the conferencing application business logic, SIP call control and web interface were implemented. This shows that the JSR 309 API enables application developers to focus on business logic and user interface design, leaving it to the JSR 309 driver to handle Media Server interactions efficiently.

The Conferencing Application can easily be further enhanced by using JSR 309. For example:

  • Using VoiceXML would be an option for dealing with participant interactions. JSR 309 supports control of VoiceXML resources through the VxmlDialog interface.
  • Although this conferencing application only deals with Audio, adding Video support is easy and additional features for setting VideoLayout are supported by JSR 309, and could easily be added to the application.
  • Adding detection of Active Talker and display which participant is talking on the Web Interface

The Conferencing application sample demonstrates the value and feasibility of developing and deploying multimedia applications within a distributed Application Server – Media Server architecture using open standard Java APIs such as SIP Servlet 1.1 and Media Server Control 1.0.

References


Marc Brandt is master technologist at HP Software & Solutions CMS, Communications and Media Solutions. He has been responsible for coordinating Industry Standards activities across a wide range of organizations and forums HP CMS is member of. He has been co-leading JSR 309 since inception in January 2007.

Tomas Ericson is Principal Member of Technical Staff at Oracle. He has long experience in Java server programming in the telecom/communications domain in general, and specifically in the area of IMS/SIP//VoIP/Presence. He is co-spec lead of JSR 309, and is a member of the JSR 289 (SIP Servlet API) expert group.

Alain Comment is senior developer at HP Software & Solutions CMS, Communications and Media Solutions. He leads HP OpenCall Media Platform interfaces and internals and has been main contributor to the JSR 309 Specification.