Deepfakes... and its threats (part 1)
Mis à jour : 10 déc. 2020
Deepfake is one of AI most popular topics. People enjoy the WOW-effect of showing off in surrealistic videos, puting their face on a celebrity’s body, making paintings talk (ask Mona Lisa!) or speaking with somebody else’s voice.
These tricks have already emerged for some years ago, but today the technology brought to the public is much more mature and you can create fake videos/images with the only help of your smartphone.
Deepfake rose together with the deep learning boom following ImageNet competition in 2012. Algorithms became better than humans at unconstrained tasks like object detection or image segmentation. Soon the technology spread to the mass market and in 2016 appeared the first popular deepfake applications with Neural Style Transfer (NST), used by many mobile applications such as DeepArt or Prisma. Images, then videos became ground for manipulation. As of today, it is possible to turn a video from a camera into a painting in real-time.
Deepfake for fun?
On top of entertaining applications like Reface, MSQRD, or Face Swap Live, that allow you to swap faces on any video with extremely realistic results and no technical knowledge, other deepfake applications tend to be more than just leisure.
With the help of deepfakes, one can make historical figures live once again, for educational purposes. One can generate new lectures read by geniuses like Niels Bohr or Richard Feynman, as if they were alive. Once can generate TV news just by writing a text and translate it to any language with one Python script.
One can also recreate human interactions, feelings and emotions when they would be otherwise lacking, for UX or even medical purposes. Mimics, non-verbal emotions or patterns specific to various cultures can be generated from one common source.
Creativity is also an exciting trend. One of the latest developments is GPT-3, already used for writing new stories, generating different texts and assembling them into one realistic output.
All these very exciting developments have a major downside though: as the technology evolves rapidly, so do the security threats it generates.
Misuse and ethical issues
Simultaneously, even if deepfake looks so fun and promising, there are plenty of probable misuses, which can impact the overall condition of human rights and personal freedom in the world. Unfortunately, it is already happening. Nowadays deepfakes have become a great threat to democracy. There is a huge problem in the world with fake news and fake content, which becomes more and more advanced with new deep learning technologies.
Fakes existed for a very long time (thousands of years), but it has never been as efficient as it is now. Online hoaxes spread 10 times faster than accurate stories, and news with fakes debunking has much and much lower coverage than an original fake. We should also consider the so-called liar's dividend: if there are lots of lies in the informational environment, people start disbelieving the truth. This is probably, the most dangerous side of the technology - you don't know who or what to believe anymore.
We already have cases of personal abuse, which destroyed people's lives because of deepfakes. The first well-known one probably is Nancy Pelosi's case. It was made with the so-called cheap-fake technology - just by slowing down a video, pretending that she is drunk. It was so simple, but it caused a huge scandal. Just imagine what malefactors can achieve with the latest technologies!
State of the art
We decided to take a look into the means available to detect and identify visual deepfakes. There are several classes of approaches, many of them using heuristic features like eyes blinking patterns, blood pressure detection on a face, etc. Also, there is a possibility to detect fakes via analysis of optical flow for face landmarks. We investigated a lot of papers and Kaggle competition winners’ solutions and found out that, according to us, the most efficient is still the most straightforward: analysis of a video frame by frame.
The Kaggle winner uses certain deep neural network architecture (currently the best one) to classify if an image from a frame is a fake or not. At the same time, there are papers, which introduce algorithms that have not participated in the competition and seem to show a better result than the winner. The first place model is a deepfake detector that achieved 65.18% average precision on the test/black box dataset, which had a corpus of 10,000 videos. This is not significantly better than a random classifier, so the problem still requires a strong solution. And that's why as a baseline for our research, we have chosen FaceForensics, which is the most popular tool for DeepFake detection and it is available online: https://arxiv.org/pdf/1901.08971.pdf.
We have chosen three big datasets: FaceForencics++, Google&JigSaw Deep Fake Detection Dataset, and Kaggle Deepfake Dataset. The FaceForensics++ dataset contains fakes created with classic approaches like Face2Face (https://arxiv.org/abs/2007.14808), FaceSwap (https://github.com/deepfakes/faceswap), DeepFakes (https://github.com/MarekKowalski/FaceSwap/), and NeuralTextures (https://arxiv.org/abs/1904.12356). Another one is the Kaggle dataset: https://www.kaggle.com/c/deepfake-detection-challenge/data, which consists of almost 500 Gb of data. Recently Google & JigSaw introduced a new dataset, which contains over 3000 manipulated videos from 28 actors in various scenes.[https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html]
We choose to utilize segmentation instead of just classification to detect deepfakes. Segmentation has an advantage over classification for several reasons. First of all, the nature of deepfake is a mask: some pixels are fake and some are real (taken from an original image). In any case, the ultimate goal is to find probably faked pixels, analyze them, and conclude if a whole image and, thus, a video, is faked or not. Secondly, we need to understand how the fake was generated: segmentation helps very much in this regard. We can see the fake artifacts on the image itself and we get much more information about the technology that stands behind the faked video.
Our idea is therefore to use a state-of-the-art segmentation model - industrial UNet (https://ngc.nvidia.com/catalog/resources/nvidia:unet_industrial_for_tensorflow) and train it using differences for videos from the dataset between original and fake videos (using this trick, we obtain masks for faked pixels).
In our next article we will share with you the outcome of this approach, how it compares to the other ones publicly available, and the other ideas we could have to further improve this strategic fight raising at a close horizon: being able to identify and characterize deepfakes in our day-to-day lives. Stay tuned!