In my previous blog post we looked at deepfakes and how machine learning can be used to forge or construct digital impressions such as images or video. Many of the examples provided there use a technique based on a paper by Ian Goodfellow et al from 2014 named “Generative Adversarial Networks”, GAN for short.
In this blog post, I will describe on a very high level how a GAN is composted and trained. In the next blog we will run an example. If you question what this topic does on the SAP Community Blog, you are absolutely entitled to. Go back to part one to read my reasoning.
A GAN is a game of faking and detecting the fakes. (Photo: Wikipedia)
A GAN consists of two neural networks competing to become the best. The first network, called the “generator”, will create samples. The second, called the discriminator, will try to detect if a sample is created by the generator or is a real sample for an existing sample library.
A common analogy is with an art expert (the discriminator) and an art forger (the generator). The forger’s job is to do it’s best to create art that the expert will believe is real. Starting with just some random noise, it doesn’t do a very good job. But with training it becomes better at fooling the expert to believe that the fake art is the real deal.
The expert will do it’s best to tell if a piece of art is authentic or if it’s a fake. Half of the time, it is presented with authentic samples from an existing sample pool, and the other half with fakes created by the forger.
The discriminator at work (Photo: via Lebanon Daily Star)
The art expert will do a pretty bad job in the beginning of the career but will learn from mistakes made. After substantial training, the expert will know all about the type of artwork in question.
If we were to draw a conceptual drawing of the architecture, it would be something like this:
Training of the discriminator (expert) and generator (forger) is optimal when the discriminator gets it right half of the time and generally has no clue to if the presented sample is a masterpiece or a fake.
The reason for going back and forth between the two networks during training is to prevent overfitting and unbalance between the discriminator and the generator. Once the training is completed, the discriminator model can be used separately as a detector of the type of data it was trained on, while the generator can be used separately to generate believable samples.
Faceless Portrait #3, 2019, Digital Print on Canvas, by GAN/Ahmed Elgammal, exhibited at HG Contemporary, NYC, 2019
Achieving good quality with network setup and parameters is not an easy task. The models can face a range of challenges and yield an output quite different to what one would expect:
Generated images of animals, demonstrating challenges with global structure and counting. (Photo: Goodfellow et al 2016)
Even though it isn’t easy to tune – and there are challenges with training – the trouble seems to fade when looking at the potential of this technology (some of which are mentioned in the previous post).
In the next blog, we will see how you can run a GAN to generate handwritten digits. Stay tuned!
- Goodfellow et al, 2014, Generative Adversarial Networks”, URL: https://arxiv.org/abs/1406.2661
- Chanchana Sornsoontorn “How do GANs intuitively work?” URL: https://hackernoon.com/how-do-gans-intuitively-work-2dda07f247a1
- Ian Bogost “The AI-Art Gold Rush Is Here” URL: https://www.theatlantic.com/technology/archive/2019/03/ai-created-art-invades-chelsea-gallery-scene/584134/