Fighting Deepfakes - Part I
Mis à jour : il y a 5 jours
Deepfake is a very hot topic right now. There are plenty of reasons, and first of all, it’s about WOW-effect - people enjoy seeing themselves on a video in some unreal situations, put their face to a celebrity’s body, use somebody’s face as a camera filter, make famous paintings talk (like Mona Lisa) or be able to speak somebody else’s voice. Such manipulations were developed years ago, and now the technology comes to the public as it is much more mature today and you can create fake videos and images even with your phone.
Everything started with machine learning extreme growth with ImageNet in 2012. Firstly, algorithms became better than humans in unconstrained tasks like object detection and segmentation. There are so many applications for machine learning and, specifically, deep learning that lots of scientists name the current times the new industrial revolution. Now the algorithms can not only describe and understand reality but even change it or create a new one.
Probably, the first famous use-case that appeared on the mass market in 2016 was Neural Style Transfer (NST) by famous mobile applications like DeepArt and Prisma. In the beginning, we could change the style of a photo. For example, turn your selfie into a Van Gogh painting. Thanks to developments of academia and industry it was then also extended to video feeds. Nowadays, it is possible to turn a video from a camera into a painting in real-time on a portable device. So, everything is growing rapidly and we have to take one step ahead in terms of possible consequences and security threats.
What we have now are applications like Reface, MSQRD, or Face Swap Live, that allow you to swap faces on any video and get extremely realistic results only with your phone and without any prior technical knowledge. But that's not only about fun. There are plenty of applications for deepfakes that can make our life better. We can make historical figures alive again, we can reconstruct some important events in people's history (but in the digital world). We can personalize human interaction - you don't have to turn on your camera to show yourself during a verbal conversation or messaging, so people can get more real-life feelings and emotions without any intervention into their private space.
Deepfake technology makes it possible to create lectures by ingenious people like Niels Bohr or Richard Feynman, it makes possible to create TV news just by writing a text and translate it into any language just with one Python script. And in this case, we can translate not only the words themselves but also some mimics, emotions non-verbal patterns specific for certain cultures. Another application is creativity. One of the latest hot developments is GPT-3, which is already used for writing fake stories by generating several different texts and then constructing a very realistic one.
Misuse and ethical issues
Simultaneously, even if deepfake looks so fun and promising, there are plenty of probable misuses, which can impact the overall condition of human rights and personal freedom in the world. Unfortunately, it is already happening. Nowadays deepfakes have become a great threat to democracy. There is a huge problem in the world with fake news and fake content, which becomes more and more advanced with new deep learning technologies.
Fakes existed for a very long time (thousands of years), but it has never been as efficient as it is now. Online hoaxes spread 10 times faster than accurate stories, and news with fakes debunking has much and much lower coverage than an original fake. We should also consider the so-called liar's dividend: if there are lots of lies in the informational environment, people start disbelieving the truth. This is probably, the most dangerous side of the technology - you don't know who or what to believe anymore.
We already have cases of personal abuse, which destroyed people's lives because of deepfakes. The first well-known one probably is Nancy Pelosi's case. It was made with the so-called cheap-fake technology - just by slowing down a video, pretending that she is drunk. It was so simple, but it caused a huge scandal. Just imagine what malefactors can achieve with the latest technologies!
State of the art
We decided to take a look into the means available to detect and identify visual deepfakes. There are several classes of approaches, many of them using heuristic features like eyes blinking patterns, blood pressure detection on a face, etc. Also, there is a possibility to detect fakes via analysis of optical flow for face landmarks. We investigated a lot of papers and Kaggle competition winners’ solutions and found out that, according to us, the most efficient is still the most straightforward: analysis of a video frame by frame.
The Kaggle winner uses certain deep neural network architecture (currently the best one) to classify if an image from a frame is a fake or not. At the same time, there are papers, which introduce algorithms that have not participated in the competition and seem to show a better result than the winner. The first place model is a deepfake detector that achieved 65.18% average precision on the test/black box dataset, which had a corpus of 10,000 videos. This is not significantly better than a random classifier, so the problem still requires a strong solution. And that's why as a baseline for our research, we have chosen FaceForensics, which is the most popular tool for DeepFake detection and it is available online: https://arxiv.org/pdf/1901.08971.pdf.
We have chosen three big datasets: FaceForencics++, Google&JigSaw Deep Fake Detection Dataset, and Kaggle Deepfake Dataset. The FaceForensics++ dataset contains fakes created with classic approaches like Face2Face (https://arxiv.org/abs/2007.14808), FaceSwap (https://github.com/deepfakes/faceswap), DeepFakes (https://github.com/MarekKowalski/FaceSwap/), and NeuralTextures (https://arxiv.org/abs/1904.12356). Another one is the Kaggle dataset: https://www.kaggle.com/c/deepfake-detection-challenge/data, which consists of almost 500 Gb of data. Recently Google & JigSaw introduced a new dataset, which contains over 3000 manipulated videos from 28 actors in various scenes.[https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html]
We choose to utilize segmentation instead of just classification to detect deepfakes. Segmentation has an advantage over classification for several reasons. First of all, the nature of deepfake is a mask: some pixels are fake and some are real (taken from an original image). In any case, the ultimate goal is to find probably faked pixels, analyze them, and conclude if a whole image and, thus, a video, is faked or not. Secondly, we need to understand how the fake was generated: segmentation helps very much in this regard. We can see the fake artifacts on the image itself and we get much more information about the technology that stands behind the faked video.
Our idea is therefore to use a state-of-the-art segmentation model - industrial UNet (https://ngc.nvidia.com/catalog/resources/nvidia:unet_industrial_for_tensorflow) and train it using differences for videos from the dataset between original and fake videos (using this trick, we obtain masks for faked pixels).
In our next article we will share with you the outcome of this approach, how it compares to the other ones publicly available, and the other ideas we could have to further improve this strategic fight raising at a close horizon: being able to identify and characterize deepfakes in our day-to-day lives. Stay tuned!