An Introduction to SIP, Part 1: Meet SIP

by Emmanuel Proulx
02/07/2006

Abstract

The Session Initiation Protocol (SIP) is a signaling protocol with great significance to the telecommunications industry. This article provides a general and technical introduction to SIP, and shows how SIP is an important enabler of telecommunication solutions.

Introduction

I once had a great idea for a piece of software that would "float" on top of an application, providing assistance. No, this would not be a dumb "help" system. It would be a live technical support agent, conferenced in over the Internet. At the time I was told "there are no tools, libraries, protocols, or bandwidth for doing that!"

Times have changed.

Many people have broadband at home, over DSL, cable, and other technologies. Good quality tools and libraries abound, both commercial and open source. Standards enable applications. Now is the time for the cool ideas to be executed.

Meet SIP

Let me introduce you to SIP, the Session Initiation Protocol. SIP is a lightweight, extensible, request/response protocol for starting communication sessions between two end-points. Does this sound familiar? SIP was inspired by HTTP and SMTP conceptually, although its intent is different. You can compare SIP messages to the CB lingo 10-codes, and Q-signals.

Figure 1
Figure 1. Lingo used to manage a CB call

In this example, the real message is surrounded with special call negotiation messages.

SIP was supplied by the IETF in 1999, and then revised in 2002. It is described in RFC 3261. Much of the information about SIP in this article was distilled from that RFC. Many extensions to SIP exist, and many of these extensions can be found in this list of SIP-related RFCs and drafts.

What are the benefits of SIP? Generally it is used by two end-points to negotiate a "call." By negotiate I mean the medium (text, voice, other), the transport (usually RTP, Real Time Protocol), and the encoding (codec). Once the negotiation is successful, the two end-points use the selected method for talking to each other—independently of SIP. Once the "call" is over, SIP is used to indicate a disconnection. Therefore, SIP is best used as a signaling mechanism. SIP and its extensions also provide related functions such as instant messaging, registration, and presence.

An end-point in the SIP jargon is called a user agent. This could be a "soft phone," an instant messenger, an IP phone, or even a cellular phone. Centralized services are provided by server user agents such as registrars, proxies, or application servers.

SIP sounds very simple, and it is. But while this simplicity is important for the protocol to be stable, it doesn't limit the usefulness of the protocol, which has found a rich set of applications areas.

Think of HTTP for example. The protocol definition by itself is tiny. But the ways to use it are unlimited. SIP is also extensible. Dozens of extensions already exist for SIP that cover a wide range of applications. Let's now take a more in-depth look at SIP and why it's important.

Is SIP Significant?

It's been said that what HTTP did for the Web, SIP will do for telecommunications.

SIP has colossal repercussions in the telecommunications industry. Cellular-technology companies have decided to standardize on SIP for all future applications. VoIP (Voice over IP) vendors, Internet telephony, and instant messaging applications (for example, Microsoft MSN Messenger) are all standardizing on SIP.

Some signaling protocols as well as peer-to-peer technology already exist. This begs the question, what advantage does SIP have over them? SIP offers some definite advantages:

  • Stability: The protocol has been used for years now and is rock-solid.
  • Speed: This UDP-based tiny protocol is extremely efficient.
  • Flexibility: This text-based protocol is easily extensible.
  • Security: Features like encryption (SSL, S/MIME) and authentication are available. Extensions to SIP offer other security features.
  • Standardization: With the entire telecommunications industry moving to it, SIP is rapidly becoming the standard. Other technologies may have some advantages over SIP, but they lack global adoption.

This means that if you want your applications to interoperate with other tools, equipment, and servers, SIP is the best way to go. Vendors are serious about interoperability and meet on a regular basis to test their products together. These meetings are called SIPit for SIP Interoperability Tests (previously Bakeoff, which was renamed because Pillsbury sued).

Anatomy of a SIP Call

Let's now look more closely at the technology. SIP is usually transported over UDP although TCP must also be supported by SIP tools. A SIP message contains two parts:

  • An envelope that describes a request or the result of a request (response) in the form of header fields.
  • An optional payload or content that contains data about the request.

The envelope is text, but the content may be text or binary.

As an example, let's dissect a typical SIP call. In this scenario, User A wants to call User B. Figure 2 illustrates this call:

Figure 2
Figure 2. A typical SIP call

All these messages are explained here:

1. User Agent A sends a SIP request "INVITE" to User Agent B to indicate User A's wish to talk to User B. This request contains the details of the voice streaming protocol. The Session Description Protocol ( SDP) is used in the payload for this purpose. The SDP message contains a list of all media codecs supported by User A. (These codecs are using RTP for transport.)
INVITE

sip:UAB@example.com

SIP/2.0

Via: SIP/2.0/UDP 10.20.30.40:5060

From: UserA <sip:UAA@example.com>;tag=589304

To: UserB <sip:UAB@example.com>

Call-ID: 8204589102@example.com

CSeq: 1 INVITE

Contact: <sip:UserA@10.20.30.40>

Content-Type: application/sdp

Content-Length: 141



v=0

o=UserA 2890844526 2890844526 IN IP4 10.20.30.40

s=Session SDP

c=IN IP4 10.20.30.40

t=3034423619 0

m=audio 49170 RTP/AVP 0

a=rtpmap:0 PCMU/8000
2. User Agent B reads the request and tells User Agent A it has been received.
SIP/2.0

100 Trying

From: UserA <sip:UAA@example.com>;tag=589304

To: UserB <sip:UAB@example.com>

Call-ID: 8204589102@example.com

CSeq: 1 INVITE

Content-Length: 0
3. While the phone rings, User Agent B sends provisional messages (ringing) to User Agent A just so it doesn't time out and give up.
SIP/2.0

180 Ringing

From: UserA <sip:UAA@example.com>;tag=589304

To: UserB <sip:UAB@example.com>;tag=314159

Call-ID: 8204589102@example.com

CSeq: 1 INVITE

Content Length: 0
4. Eventually User B decides to accept the call. At this point User Agent B sends an OK response to User Agent A. In the payload of the response, there's another SDP message. It contains a set of media codecs that are supported by both user agents. At this point both parties are officially in the call. All types of SIP requests are accepted using 200-type responses.
SIP/2.0

200 OK

From: UserA <sip:UAA@example.com>;tag=589304

To: UserB <sip:UAB@example.com>;tag=314159

Call-ID: 8204589102@example.com

CSeq: 1 INVITE

Contact: <sip:UserB@10.20.30.41>

Content-Type: application/sdp

Content-Length: 140



v=0

o=UserB 2890844527 2890844527 IN IP4 10.20.30.41

s=Session SDP

c=IN IP4 10.20.30.41

t=3034423619 0

m=audio 3456 RTP/AVP 0

a=rtpmap:0 PCMU/8000
5. User Agent A finally confirms with an ACK message. There are no retries and no response messages for this request type, even if the message is lost. ACK is only used in the case of an INVITE message.
ACK

sip:UAB@example.com SIP/2.0

Via: SIP/2.0/UDP 10.20.30.41:5060

Route: <sip:UserB@10.20.30.41>

From: UserA <sip:UAA@example.com>;tag=589304

To: UserB <sip:UAB@example.com>;tag=314159

Call-ID: 8204589102@example.com

CSeq: 1 ACK

Content-Length: 0
6. Both user agents are now connected using the method selected in the last SDP message. RTP packets of audio data going in both directions over ports 49170 & 3456 using PCMU/8000 encoding.
7. At the end of the communication session, one of the users hangs up. At this point this user's user agent sends a new request BYE. This message can be sent by any of the parties.
BYE

sip:UAB@example.com SIP/2.0

Via: SIP/2.0/UDP 10.20.30.41:5060

To: UserB <sip:UAB@example.com>;tag=314159

From: UserA <sip:UAA@example.com>;tag=589304

Call-ID: 8204589102@example.com

CSeq: 1 BYE

Content-Length: 0
8. The other user's user agent accepts the request and replies with an OK message. The call is disconnected.
SIP/2.0

200 OK

To: UserB <sip:UAB@example.com>;tag=314159

From: UserA <sip:UAA@example.com>;tag=589304

Call-ID: 8204589102@example.com

CSeq: 1 BYE

Content-Length: 0

The first line of a SIP message contains the type of message and the version of SIP used (2.0). In requests, this line also contains an address called the SIP URI. This represents the destination of the message.

This example illustrates the use of request messages INVITE, ACK, and BYE, as well as the 200 OK response message. Many other messages exist in SIP. Here are a few requests:

Message Usage
INVITE Call a user agent, transfer a call.
ACK Confirm the call.
BYE End a call.
CANCEL End a call that hasn't been OK'd yet.
REGISTER Provide a registrar service with a contact address and the alias that can be used instead. For example, the address sip:UAA@example.com is an alias for sip:UserA@10.20.30.40 in the previous example. The registrar server example.com can then forward calls for UAA to the address 10.20.30.40.
OPTIONS Ask a user agent for its "capabilities" (for example, messages and codecs it understands).

Now here are some often-used response messages:

Message Usage
100 Trying The message has been received but not processed by the end user agent yet. Please wait.
180 Ringing The message has been received by the end user agent, which is prompting the user. Please wait.
200 OK The message was accepted by the end user.
301 Moved Permanently & 302 Moved Temporarily The address of the user agent has changed; here's the new permanent or temporary address, in the Contact field.
400 Bad Request A generic error message. The client doesn't understand the message.
401 Unauthorized & 407 Proxy Authentication Required Please try again with credentials.
404 Not Found The user you're trying to reach doesn't exist or isn't registered.
408 Request Timeout The other party isn't responding. This means a SIP message was never OK'd. All retries were dropped as well. It doesn't mean that the phone rang for too long (phones can ring forever).

Messages use similar types of header fields. Here are a few of them:

Header field Usage
From Sender of the SIP request.
To Receiver of the SIP request. This is often the same as the SIP URI (can be an "alias" or a real address).
Contact Real address of the user agent.
Call-ID No, this isn't the phone number of the caller. It uniquely represents the whole call, or dialog, between the two user agents. All related SIP messages use the same Call-ID. For example, when a user agent receives a BYE message, it knows which call to hang up based on the Call-ID.
CSeq Sequence number of a message. This is unique inside a single dialog or Call-ID. This is used to differentiate between new messages and "retries." Retries happen when an initial message isn't OK'd in time, and are sent at regular intervals.
Content-Type The MIME type for the payload inside the message.
Content-Length Size in bytes of the payload. The envelope and the payload are separated by an empty line.

Additional headers exist for message-routing-related functions, like Via, Route, and Record-Route. Many headers provide capabilities such as Accept, User-Agent, and Supported. Other headers provide security such as Authorization, Privacy, and WWW-Authenticate. Many more headers exist. Also, many of these fields have a short syntax (for example, From = f, To = t, and so on).

What Else Can SIP Do?

There are many applications that can be implemented with SIP and its extensions:

  • VoIP
  • Videoconferencing
  • Instant messaging for text and data, like MSN Instant Messenger
  • Registration (I'm online!)
  • Presence (Is my buddy available?)
  • Click-to-talk (click here to speak with a technical support agent)
  • Answering machine/Interactive Voice Response (IVR) system ("Enter your password. Record your name. Press 1 for English, press 2 for Spanish...")
  • Networked games such as Quake and some cell phone games (even based on voice and IM)
  • Cell-phone-top applications
  • Mobile e-Business

Basically, if it makes two end-points communicate, SIP can do it.

But what about my super-duper idea of a live over-the-Web technical support agent? Can we do that today with SIP? And can we do that in Java, my favorite language? In short, yes.

SIP in Java

I work with SIP a lot. I can safely say that Java offers excellent support for SIP. An assortment of Java technologies, helpful to SIP developers, abstracts away many details associated to developing SIP applications. These are mostly in the JAIN (Java APIs for Integrated Networks) work group:

Other related technologies are:

  • JAIN SDP ( JSR 141)
  • Java Media Framework for RTP (J2SE optional package, not JAIN)

If you wish to develop a client application, you need a client-side SIP engine, or "stack." A good, open source, Java SIP stack is available here. It also supports SDP. If you don't want to develop your own SIP phone, you can use this one.

Conclusion

This article provides a brief introduction to SIP, the scenarios in which you can use it, and a bit of the SIP syntax. We also glanced at the various Java technologies related to SIP. Although the article isn't detailed, I hope it's enough to kindle your interest and encourage you to start using it. SIP's time has arrived and now plenty of cool ideas can finally be realized.

In Part 2 of this article series, I will show you how to write a chat room application using the SIP Servlet API.

References

Emmanuel Proulx is an expert in J2EE and SIP. He is a certified WebLogic Server engineer.