We put this WebRTC Proposal together for a call center client that was looking for an alternative to Twilio.
2. Application Flow
2.1 Agent navigates to Company Web Server to bring up the WebRTC page
The WebRTC page contains SIP.js or sipML5. Both are open source. We have leveraged SIP.js before.
2.2 Company delivers HTML5 SIP Client to Agent.
When the HTML loads, the Company HTML5 SIP Client makes an AJAX request to a new component, the Company Media Server Manager, asking which media server to use for the call. We’ve gone back and forth on whether we should front the media servers with a Proxy, OpenSIPS, or use an App Server. We’re opting for the App Server because the App Server enables us to capture utilization statistics on the media servers to determine which machine has the least amount of load.
2.3 HTML5 SIP client sends request to Company Media Manager Server
The Company Media Server Manager is written in Java running on Tomcat. We could also write it in C#. The Company Media Server Manager manages agents and media server boxes. It utilizes a little MySQL instance to keep track of the media servers, their statistics, and conceivably to manage agents. There would also need to be a little CRUD user interface to manage the media server and agent entities.
2.4. Company Media Manager Server returns the media server
The Company Media Server Manager returns the IP address of the media server, which has the least amount of utilization. Initially, utilization is defined as the number of agents in call.
2.5. Agent initiates call to media server
We’ve changed the media server from FreeSWITCH to Asterisk. The Asterisk has a couple of advantages. The biggest advantage is we are much more familiar with the Asterisk Manager Interface which can provide utilizations statistics on the Asterisk machines back to the Company Media Manager Server.
When the agent initiates the call to the Asterisk IP Address, we start to send WebRTC and SIP over WebSocket to the Asterisk server.
2.6. Agent is placed in conference to wait for client leg
The Asterisk is capable of communicating via SIP over WebSocket. The Asterisk WebRTC module will handle the WebRTC. When we put the agent into conference, the conference module will transcode the OPUS codec to G.711. The conference takes place on the Asterisk Media server. We also have a resource that has implemented this exact solution.
2.7. Asterisk sends call events to Company Media Manager Server
When a new media server is provisioned, the media server is provisioned with the IP address of the Company Media Server Manager or a pool of these machines. When the call starts to take place, we’d send call state back to the Company Media Server Manager via the Asterisk Management Interface enabling us to load balance.
2.8 Either the Acision Sip Proxy sends the client leg to Asterisk or we could initiate the call through the Asterisk.
For the client leg, the agent is sitting in conference, how does the client leg get into the conference? One option is to have the Acision SIP Proxy route the call to the appropriate Asterisk server. Another option is for the Asterisk media server to initiate the client leg. The call would run through the Acision SIP Proxy out to the PSTN. This would effectively enable us to “nail up” the agent if needed.
3. Questions
3.1. Will the Asterisk media server IP addresses have to be publicly exposed?
In this solution, yes, the Asterisk media server IP addresses would be publicly exposed. We could front the media servers with an OpenSIPS, but it would probably be easier to just open a range of 255 IP addresses. If we exceed 255 media servers, then we could either look into fronting it with an OpenSIPs or splitting up the WebSocket and transcoding work.
3.2 How does the NAT Traversal work?
We need to learn more about what Company does today.
3.3 How do we scale? How do we deal with the transcoding?
We scale by adding Asterisk media servers to the pool. The Company Media Manager Server then routes the load.