Building a python webrtc backend with aiortc
In this article, we share our experience and a few lessons learned dealing with aiortc, a handy python package produced and open sourced by Jeremy Laine  that allows establishing a simple python webrtc backend. Webrtc is a widely adopted peer-to-peer media exchange protocol, supported by most browsers and mobile phones, and behind many video conference solutions like GMeet or Facebook Messengers.
The aiortc package is not only useful to establish webrtc servers : it also proposes classes and functions to build python webrtc clients. However we will focus on server, as this was our own use case and probably the main use case for most python developers. This post assumes you are already familiar with webrtc concepts, as we focus on the specifics of the aiortc package : we invite you to have a look here for an introduction to webrtc.
We used this package to build a conversational agent leveraging Generative Pre Trained NLP models ChatGPT, GPT3 and GPT-J, in conjunction with Google and AWS Text to Speech and Speech to Text services: feel free to contact us if you want to know more about this exciting project.
Here is a list of key concepts before you start using this aiortc package :
There is a server.py demo code inside the example folder (example/folder/) which is a good starting point to establish a webrtc server
This server is designed to work in parallel sessions : multiple connection means parallel execution of the functions designed to handle the different events.
The peer-to-peer connection is expected to be initiated by the client. The demo server proposed by the package expects to receive an offer and act upon it.
The package is based on events handler : one main python function is there to handle the overall peer connection negotiation, and then sub functions are defined to handle ICE connection changes, reception of datachannel, audio or video tracks, etc.
the package works natively with asyncio, so you should be familiar with asyncio library and asynchronous functions before you use this library.
Aiortc package offers a custom class (called MediaStreamTrack) to handle video and audio tracks, based on another package pyav, and pyav AudioFrame or VideoFrame. This custom class can be overridden to code what your server should do with the incoming or outgoing tracks (see below).
It is possible to generate tracks and add them to the peer connection, to send back audio/ video tracks to webrtc clients. The particular example in the server sends back the incoming video track as an outgoing track after performing some operations in it, but of course incoming and outgoing tracks can be completely independent.
A single datachannel track can be bilateral, contrary to video or audio tracks.
The aiortc package comes with a logger (logging package) and you can adapt the level of verbosity of the logs : we strongly recommend to change logs to DEBUG level whenever you face undesired / unexpected behavior, as the logs may be the only source of information available. The aiortc server may continue to work and not throw a stopping exception even if the track sending or reception is broken, for instance.
If your server is emitting a track, It is important to establish your outgoing track and add it to the peer connection on the server before the server sends back the ICE response, otherwise the peer connection will open in unilateral mode (recv mode only). Since the package is based on asyncio functions, this sequenciality may not be trivial to secure.
We have met robustness issues, in case of multiple and rapid connections / disconnections, where some of the peer connections entered a ghost stage and do not fully connect while no connection loss is detected by the event handlers (@pc.on("connectionstatechange") or the @track.on("ended")). In such case, the associated resources were never freed which was an issue for us. We had to create a corountine checking for the pc connection status (pc.connectionState !='connected') a few seconds after sending a pc offer to avoid such a case.
Basic dealing with audio tracks
Any input track that you wish to receive on your server needs to be consumed, i.e the recv() function of the incoming track should be called somewhere in your server code to receive each new frame.
The recv() function of the incoming media track can be overridden if you need to process the data from the track:
You can create a new class with heritage from the MediaStreamTrack class (which can be found in mediastreams.py) that you will initialize with the incoming track
Inside this new class, you can define a new recv() function which will call the original input track recv() function and then do whatever you want with the audio or video frame. This is demonstrated in the server.py demo file with the VideoTransformTrack class.
An easy way to consume the incoming track is to create a MediaRecorder and associate it to an actual media (saving on file, for instance) or create a MediaBlackHole that will consume the track without doing anything with it
In both case, you should not forget to start the recorder or the blackhole (blackhole.start()) otherwise the recv() function will still not be called and the track won’t be consumed.
The only supported audio codecs for incoming tracks in aiortc are : opus 48 khz, PCM 8 khz and PCM 16 khz.
An output track, once added to a peer connection will automatically be consumed by the peer once the ICE negotiation is completed and the peer connection is established (at least, this is the default behavior of most peer libraries like chrome webrtc).
Pyav and AudioFrames
The pyav library’s documentation is scarce, so it is not easy to work with Audioframes and understand how to build or manipulate them. One of the sensitive feature is the timestamp, which is frame.pts property : not setting it correctly can impair the way tracks are manipulated which can lead to interruption of the reception or issuance of tracks
An example of generation of a track with Audioframes and proper setting up of timestamps is given in class AudioStreamTrack inside mediastreams.py
One needs to be careful with the shape of the array used to build audioframes : it should match the properties set for the audioframes, in particular the number of channels (stereo vs mono).
It is possible to record an incoming or outgoing media into a file using a class named MediaRecorder, part of the library.
This MediaRecorder calls the recv() function of the audio track, so it consumes the track.
This class does not manage pts gaps, so an incoming track should not directly be recorded with a MediaRecorder (as provided by the library) because network interruptions or connection issues will lead to gaps in pts between consecutive audioframes, leading to a halt of the reception of the incoming track (no more call to the recv() function).
A way to override this issue is to modify the MediaRecorder function to override the pts from the audiotrack and force it to be contiguous.
This function allows you to convert a frame from one shape to another : this can typically be used to move from stereo to mono audio tracks. However, there are no documented cases when this function would succeed or not to transform the frames, so it should be used with care inside a try function (catching the ValueError exception).
This class also does not manage pts gaps, so its usage is unstable to network interruptions or connection issues. Pts should be forced to contiguous values before this resampler is used.
A media relay allows duplicating an incoming or outgoing track so that different processing can be performed on the splitted tracks. As an example, a relay can be used to record a track while emitting this same track via a peer-to-peer connection.
In order to use a MediaRelay, you should first instantiate a relay :
relay = MediaRelay()
and then create MediaStreamTrack relays :
track_relay1 = relay.subscribe(audio_track)
track_relay2 = relay.subscribe(audio_track)
which you can then use for whatever tasks you want.
It’s important to note that, once you start using relays, you can not use / consume the original track anymore, otherwise the frames consumed from the original track will be lost for the relay (and vice versa).
It is also important to note that, once the relay is instantiated and a first consumption is made from one of the tracks that subscribed to the relay, the relay will never stop consuming the original track as long as the original track produces frames. There is no stop function built in in the relay. If you deal with an incoming track, it’s probably not an issue as the source will stop when the peer connection closes, however if you deal with an outgoing track which generates audio / video content as long as there is a consumer, then this is a possible source of issue : even if all consumers of the output track are closed, it will not stop the relay and therefore your source will continue to produce indefinitely. We faced this issue when recording an output track while pushing it to a P2P connection. We had to add a stop function to the relay that we then could call when the P2P connection is stopped by the client to stop the source from emitting altogether.
That’s all, folks ! We hope this overview will help you utilize this package and shortcut the resolution of potential issues or questions you may face. Feel free to contact us, you can of course raise issues on the github repository of aiortc directly as well.
* Follow us on LinkedIn for next blog updates:
* Interested in our skills? Let's discuss your projects together:
* Our public Github repository:
Additional resources :
 aiortc github repository : https://github.com/aiortc/aiortc