Welcome to another technical post with my take on an area with lots of confusion: matrix VOIP and turn. First: Which VOIP standards does matrix support and second, what with that TURN server in livekit. Please skip if that is not of interest to you.
Matrix VOIP
Legacy calls: In the beginning, there was direct WebRTC between participants, potentially using a TURN server that was configured in the matrix synapse configuration. This is now called legacy calls and while some very obsolete clients are still supporting it (yes, Element classic), newer ones typically don’t (Element X and others). One of the problems with this style of calls is, that media needs to be streamed directly to all participants and received from all participants, which makes that n*(n-1) streams. This is OK for 1:1, but does not scale when more than a handfull people participates in a call.
Jitsi: Some Element clients (mostly Element Web/Desktop) are able to embed web widgets. This was used to embed external jitsi servers, so that one could organize jitsi calls from within a matrix room. I don’t think other clients ever supported this. It worked, but always felt a bit like a bandaid.
MatrixRTC: The new style calls as defined in MSC 4143 and define signalling and call membership over Matrix communication channels, but leave the media transport to pluggable backends. Currently, the only usable backend is Livekit, this is defined in MSC 4195. Alternative backends are planned, e.g. a direct WebRTC mesh (potentially using TURN servers). I refer to MSC 3401 for that. But for now, it is livekit or nothing. Because there is often confusion around that: MatrixRTC calls using livekit can both be group calls and “direct” 1:1 calls (DM).
TURN and Livekit
So…., one of the most frequent annoyances, is that people try hard to set up their Matrix voip, install livekit and all that (which is tricky enough). And then they either get told in a support room, or come up with the plan of their own, to get the livekit TURN working or to install a separate coturn TURN server.
I’d just like to interject for a moment. NOOO, YOU MOST LIKELY DO NOT NEED TO GET THE INTERNAL LIVEKIT TURN SERVER RUNNING AND YOU MOST LIKELY DO NOT NEED TO SETUP A SEPERATE COTURN SERVER!!!
In order to argue my case, let me explain briefly what a turn server is for, and then turn (HAHA) to livekit, the SFU (selective forwarding unit).
If there were not firewalls, and NATs, everyone could reach everyone. VOIP would be easy and no TURN servers would be needed. But the fact is, there are many firewalls, preventing incoming connections to random ports. And NATs make participants unreachable from the outside. So a turn server is a bandaid, relaying media traffic in a way, that actors only have outgoing connections and no incoming connections when they cannot receive any. Traffic with a TURN server kind of looks like:
Alice ---------> TURN relay <----------- BobNow, livekit is a SFU, it receives, combines and distributes media streams from VOIP participants. It can do so over TCP and UDP connections. Traffic using the livekit SFU kind of looks like (simplified):
Alice ---------> livekit SFU <----------- BobUhh, ohh, that looks pretty similar, right? Right! The livekit SFU effectively performs like a TURN server, and can take over that functionality. So, in normal cases, there is NO turn needed. So what is the internal livekit turn server for (or an alternative coturn server that one already had set up)?
There are very restricted network environments (public hotspots, corporate firewalls, censoring governments) where not only incoming connections are hindered, but also outgoing connections might be prevented. The internal livekit TURN server (at least the TCP part) ONLY EVER announces port 443 via TLS (src code) instead of whatever port you might have configured as turn: TLSPort:, so it is for use cases where you are only allowed to reach other HTTPS web sites (trying to hide your media traffic as regular website https traffic). It helps in no other case! And the UDP part of the turn server is pretty much useless, as the SFU is already able to use a whole bunch of UDP ports to do its turnish thing anyway.
As a result: I have disabled the livekit internal TURN server and not set up another coturn server and am doing my matrixRTC voip calls happily every after.











