Oleksandr Lysenko
- 4 min de lecture

Deepfakes... and its threats (part 1)

Deepfake is one of AI most popular topics. People enjoy the WOW-effect of showing off in surrealistic videos, puting their face on a celebrity’s body, making paintings talk (ask Mona Lisa!) or speaking with somebody else’s voice.

These tricks have already emerged for some years ago, but today the technology brought to the public is much more mature and you can create fake videos/images with the only help of your smartphone.

Deepfake rose together with the deep learning boom following ImageNet competition in 2012. Algorithms became better than humans at unconstrained tasks like object detection or image segmentation. Soon the technology spread to the mass market and in 2016 appeared the first popular deepfake applications with Neural Style Transfer (NST), used by many mobile applications such as DeepArt or Prisma. Images, then videos became ground for manipulation. As of today, it is possible to turn a video into a painting in real-time.

Deepfake for fun?

On top of entertaining applications like Reface, MSQRD, or Face Swap Live, which allow you to swap faces on any video with extremely realistic results and no technical knowledge, other deepfake applications tend to be more than just leisure.

With the help of deepfakes, one can make historical figures live once again, for educational purposes. One can generate new lectures read by geniuses like Niels Bohr or Richard Feynman, as if they were alive. One can generate TV news just by writing a text, and translate it to any language with a single Python script.

One can also recreate human interactions, feelings and emotions when they would be otherwise lacking, for UX or even medical purposes. Mimics, non-verbal emotions or patterns specific to various cultures can be generated from one common source.

Creativity is also an exciting trend. One of the latest developments is GPT-3, already used for writing new stories, generating different texts and assembling them into one realistic output.

All these very exciting developments have a major downside though: as the technology evolves rapidly, so do the security threats it generates.

Misuse and ethical issues

Deepfake misuses already impact a wide range of domains from authentication to media content. Fake contents, fuelling fake news, become more and more advanced with deep learning technologies.

Of course, fakes are almost as old as the humankind. But they have never been as widespread as now. Due to their emotional impact, online hoaxes spread 10 times faster than accurate stories and debunking attempts have much lower coverage than the fakes they are trying to tackle. In addition to their direct impact, fake news also have an indirect, yet potentially more dangerous, impact: the so-called liar's dividend. In an media environment where any news is to be challenged, people start disbelieving the truth.

How to detect deepfakes?

After this general presentation, let us now dive together into more practical (and technical) questions: how can one efficiently detect deepfakes, to prevent potential harmful usages?

For the purpose of this article, we at Preste decided to focus on video deepfakes involving humans as a first approach to the topic.

Detecting video deepfakes can imply different approaches, usually relying on the characterisation of standard human patterns and their absence or abnormalities (in deepfakes). Examples of patterns are heuristic features such as eyes blinking patterns, blood pressure detection, analysis of optical flow for face landmarks. These methods rely on characterizing fakes within a continuum of events in a given period. In our own attempts however, we are going to explore an approach treating independently each video frame.

Indeed, after careful research of the state-of-the-art, we estimated that the most efficient method might still be the most straightforward. In a recent Kaggle competition to detect video deepfakes, the winner used a deep neural network architecture to classify each image from a frame as fake or real. This deepfake detector achieved 65.18% average precision on the test/black box dataset (10 000 videos). This is better, but not much better, than a random classifier. That is why, as a baseline for our research, we decided to use FaceForensics, which is the most popular tool for DeepFake detection and is available online: https://arxiv.org/pdf/1901.08971.pdf.

Datasets

Now let us play with data!

For our own investigation we have chosen three significant datasets: FaceForencics++, Kaggle Deepfake Dataset and Google&JigSaw Deep Fake Detection Dataset.

The FaceForensics++ dataset contains fakes created with classic approaches like Face2Face (https://arxiv.org/abs/2007.14808), FaceSwap (https://github.com/deepfakes/faceswap), DeepFakes (https://github.com/MarekKowalski/FaceSwap/), and NeuralTextures (https://arxiv.org/abs/1904.12356). The Kaggle dataset is available here: https://www.kaggle.com/c/deepfake-detection-challenge/data; it consists of almost 500 Gb of data. Recently, Google & JigSaw introduced a new dataset containing over 3000 manipulated videos from 28 actors in various scenes: https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html.

Methodology

We chose to leverage segmentation instead of just classification in order to detect deepfakes.

Segmentation has several advantages over classification. First, the nature of deepfake is a mask: some pixels are fake and some are real, taken from the original image. The ultimate goal is to find faked pixels, analyze them, and conclude if a whole image and ultimately the whole video are faked or not. Segmentation helps us achieve this goal. Second, it can be worthy to understand how a fake was generated. Thanks to segmentation, we can see fake artifacts on the image and retrieve much more information about the underlying deepfake technology.

Our idea is therefore to use a state-of-the-art segmentation model - industrial UNet (https://ngc.nvidia.com/catalog/resources/nvidia:unet_industrial_for_tensorflow) and train it using differences between videos from the original dataset and videos from the faked dataset. Using this trick, we obtain masks for faked pixels.

Next Steps

In the second part of this article we will share with you the outcome of this approach, how it compares to the other ones publicly available, and the other ideas we had to further improve this crucial looming fight: being able to identify and characterize deepfakes in our day-to-day lives. Stay tuned!