Scalability is a vague concept that can apply to different scenarios and requirements. Below some of those scenarios are exposed.
Before entering into details, let's clarify how mediasoup works internally:
A good example of this scenario is an application that provides multi-party conference rooms. Each room uses a single mediasoup router thus each mediasoup worker (which uses a single CPU) may hold multiple “rooms”.
Depending on the host CPU capabilities, a mediasoup C++ subprocess can typically handle over ~500 consumers in total. If for example there are 4 peers in a room, all them sending audio and video and all them consuming the audio and video of the other peers, this would mean that:
Depending on the needed capability, the server side application using mediasoup should launch as many workers as required (no more than the number of CPU cores in the host) and distribute “rooms” (mediasoup routers) across them.
If higher capability is required, the application backend should run mediasoup in multiple hosts and distribute “rooms” across them.
In this scenario, a single broadcaster endpoint (or a few of them) produce audio and video and the server backend stream the media to hundred or thousands of viewers in real-time (no delay). If there are more than 200-300 viewers (so 400-600 consumers), the capabilities of a single mediasoup router could be exceeded.
To help with those scenarios, mediasoup provides a mechanism to inter-communicate different mediasoup routers by using the router.pipeToRouter() API.
The concept is simple:
It's also perfectly possible to inter-communicate mediasoup routers running in different physical hosts. However, since mediasoup does not provide any signaling protocol, it's up to the application to implement the required information exchange to accomplish with that goal. As a good reference, in order to pipe a producer into a router in a different host, the application should implement something similar to what the router.pipeToRouter()
method already does (see router.ts), but taking into account that in this case both routers are not co-located in the same host so network signaling is needed.
When broadcasting a video stream to many viewers (hundreds or thousands of consumers) it's important to be aware of how video RTP transmission typically works:
In those scenarios, a “re-encoder” in server-side is required. This is, an endpoint that consumes the streams of the broadcaster endpoint, re-encodes those streams and re-produces them into a set of mediasoup routers with hundreds or thousands of consumers in total. Since such a “re-encoder” runs typically in the backend network, it's not limited by available bandwidth.
At the end, those scenarios require a proper architecture with distribution of viewers across multiple mediasoup routers (in the same or different hosts) and special “re-encoder” endpoints in the backend that can absorb PLIs and FIRs generated by a subset of those viewers.
mediasoup comes with libmediasoupclient which, among others, can be used as a re-encoder (wink, wink).