Using the latest trend in artificial intelligence – adversarial learning – Samsung’s AI Center in Moscow has demonstrated that it can take a single image of a person and turn it into a talking head. And if watching the Mona Lisa come to life doesn’t send chills down your spine, you need to check your pulse.
The system takes a number of images of a person – and that number could be just one, or more for better results – and runs it through an off-the-shelf “face landmark tracker” to work out where the eyes, eyebrows, nose, lips and jawline are. It does the same for another “driving” source video, going frame by frame to track the motion of these face landmarks.
There’s a separate meta-learning stage, in which different AI networks are trained to do different jobs, using an enormous video dataset of talking heads. An Embedder network takes source frames and their landmark tracking data to create vectors, while a Generator network learns to take vectors and images, and generate short videos in which the still faces are animated to move according to the vector movement.