THE USE OF CONDITIONAL GENERATIVE ADVERSARIAL NETWORKS FOR THE SYNTHESIS OF FACE IMAGES FROM SPEECH
Abstract
To create an artificial portrait of a person's face using audio alone, this research
suggests using Generative Adversarial Networks (GANs). Two beings primarily share
information via visual and auditory means. A huge quantity of audio has to be automatically
translated into a comprehensible picture format in some data-intensive applications, with no
human intervention required. This work presents a comprehensive methodology for generating
understandable images from audio signals. The model's generative adversarial network (GAN)
architecture synthesizes images from audio waveforms. The goal of building this model was to
use the training dataset to generate a synthesised picture of the speakers' faces from audio
recordings of their identity. The approach achieved a 96.88% accuracy rate for ungrouped data
and a 93.91% accuracy rate for grouped data by using excitation signals to produce pictures of
tagged humans.
