WebRTC and Signalling: Behind the scenes of ‘Shaadi Meet’

By Indu Soni

15 Jul 2020

5 min read

In these times of pandemic, video chats and conferencing are playing an important role in coping with social distancing and isolation while allowing us to move towards our goals, especially as a team.

WebRTC plays an important role in achieving the audio/video chats between the clients(users) connected to the internet.
Let’s try to understand the what and how of WebRTC!

What is WebRTC?

WebRTC(Web Real-Time Communication) is an open source project that enables real time media communications like voice, video and data transfer natively between browsers and devices. This provides users with the ability to communicate from within their primary web browser without the need for complicated plug-ins or additional hardware.

WebRTC is a powerful tool that’s quickly becoming the technology of choice for real-time streaming. It also works across all browsers and mobile operating systems that support the WebRTC APIs.

WebRTC has following three APIs/Components:

GetUserMedia
RTCPeerConnection
RTCDataChannel

Role of these components in WebRTC are as follows:

1) Collect audio/video data from your computer and convert into the format that can be streamed over the network.
Responsible API/Component : GetUserMedia

2) Locate both the clients on the network, so as to establish the Peer to Peer connection (i.e. no middleware between the two clients)
Responsible API/Component : RTCPeerConnection
Note :This component takes help of Signalling server to locate clients on Web.

3)Now setting up the data channel between the client so as to allow the bi-directional data transfer of any type of data – directly between the clients(no server in the middle)
Responsible API/Component : RTCDataChannel

Now for the most important and interesting part of WebRTC i.e. how it establishes the P2P connection between two clients.

The overall flow looks like as below:

Process and protocol needs to establish P2P Connection:

Locating clients on the network is accomplished by a mechanism known as Signalling.
Signalling mainly uses the following mechanism to locate both clients on network.

ICE
STUN/TURN
SDP

ICE (Interactive Connectivity Establishment):

ICE is a framework that allows WebRTC to overcome the complexities of real-world networking. It’s job is to find the best path to connect peers.

It has two main roles:

Gathering Candidates
Checking Connectivity

It may be able to do this with a direct connection between the peers but usually our computers are not directly connected to internet. In order to keep the IP address safe from potential hackers, it’s typically protected by a firewall i.e. our computer sits safely behind a Network Address Translation (NAT) device.

NAT devices are used for security purposes and they never reveal a computer’s true private IP address. Instead, they provide a public-facing IP address and translate between the public and private IP address used. If you have a wireless router in your home of office, then you are most likely using NAT.
To get the connection path between two client, sitting behind NAT, ICE makes use of other servers know as STUN/TURN.

STUN/TURN:

Computers sitting behind NAT don’t know their respective Public IPs, so ICE makes use of STUN (Session Traversal Utilities for NAT) server.
STUN server is located at a public network and is capable of viewing IP of client computer.
STUN allows WebRTC clients to find out their own public IP addresses by making a request to a STUN server.
In some cases, when STUN server is not able to identify the computers public IP, ICE uses TURN (Traversal Using Relays around NAT) server to relay media data between the clients.

SDP:

To connect two client and establish a suitable streaming for the clients requires the following information to be exchanged between the clients –

Which IP address is prepared to receive the incoming media stream
Which port number is listening for the incoming media stream
What media type the endpoint is expecting to receive
Which protocol the endpoint is expecting to exchange information in
Which codec the endpoint is capable of decoding

The above information is organized as per Session Description Protocol(SDP) and exchanged between the clients.
The process of SDP exchange between the Clients is known as Offer/Answer message flow.

A brief overview of the ‘Offer/Answer Message Flow‘

Caller first sends the SDP packet(containing caller side information) to Callee
Callee in turn responds with SDP packet that contains the Callee’s information.
SDP packet sent by the caller is known as Offer and returned SDP packet sent by callee is known as Answer.

Let’s do a small recap of how WebRTC works:

For initiating a video chat, caller needs to create an OFFER (as SDP packet) and initiate ICE protocol. ICE tries to find out the shortest route between the two clients.
If the computer is sitting behind NAT then ICE makes use of STUN/TURN server to find out computer’s IP address on network.
Once ICE establishes the route between the two computers on the network, on receiving the OFFER from Caller, the Callee will create an ANSWER (as SDP packet) and send it to the caller.
When caller and callee have both exchanged their SDP information, and WebRTC will now establish the secure data channel connecting the two computers over the network.
On completion of the above steps, WebRTC will allow clients to stream data back and forth directly (No middleware!)

References:
https://www.html5rocks.com/en/tutorials/webrtc/infrastructure/
https://bloggeek.me/how-webrtc-works/
https://andrewjprokop.wordpress.com/2014/07/21/understanding-webrtc-media-connections-ice-stun-and-turn/