Семинар LAMBDA: "Image to Image Translations Using Generative Adversarial Networks"

Мероприятие завершено

Mr. Nand Kumar Yadav (Invited speaker) is going to present his work  "Image to Image Translations Using Generative Adversarial Networks"  on the regular LAMBDA laboratory seminar on  20 March at 14:40

Image-to-image translation is a widespread computer vision task capable of dealing with problems related to image generation, image colorization, semantic map to scene generation, etc. Visible image synthesis from complex input image modalities, such as sketch face to visual face transformation, thermal face to visual face transformation, and nearinfrared (NIR) image to visual optical image generation, is a crucial computer vision task using deep learning. Usually, complex input image modalities lacks with the rich visual information such as texture, color, fine details, etc., than the actual ground truth images in target domain. In such problems, there is a large domain gap between the input images and the corresponding target images. The traditional methods of image-to-image translation is unable to learn the accurate mapping between input and target images. However, in recent years, deep learning based Generative Adversarial Networks (GANs) have shown very promising results for image generation as well as image-to-image translation. The GANbased models are generally trained in an adversarial manner which is advantageous for the high quality image synthesis. GAN consists of generator and discriminator networks for training purposes. While for inference, only the trained generator network is required for synthesizing the images. For the better generalization ability in fewer training epoch, attention based methods have been proposed. Such methods focus over the important regions in the learning phase leading to better output. The existing image-to-image translation methods, such as Pix2pix, CycleGAN, DualGAN, CSGAN, and PCSGAN, miss the ingredients of attention mechanism. The existing GAN models require intense training. Moreover, these models suffer heavily with artifacts while synthesizing the complex scenes. The Attention based AGGAN and AttentionGAN result to in better unpaired image-to-image translation, but fails to effectively handle the complex image to real image translation. We propose different GAN methods by exploiting the attention mechanism for handling complex image modalities in the context of image-to-image translation. CSA-GAN is proposed with attention mechanism without using any extra network to handle the sketch face to real face translation as well as thermal face to visible face translation. By using the attention mechanism, it converges faster than the non-attention based methods and reports better results. We also propose a novel and efficient MobileAR-GAN model using attention-gates with MobileNet for complex scene translation, such as near-infrared to visible scene translation. The MobileAR-GAN is suitable for edge devices such as Jetson Board. We also propose Attention Guided Thermal to Visible Face Translation Network using GAN (TVAGAN). The proposed TVA-GAN is capable to generate the realistic samples with diverse face datasets, such as people from different races and poses. With the emergence of self-attention mechanisms image generation tasks reported promising results. Self-attention mechanism based parallel self-attention block is used with the inception block for improved performance over the thermal face to visible face transformation in the proposed Inception based Self-Attention GAN model (ISA-GAN). Further, a full-fledged Self-Attention driven Transformer network based Transcoder-GAN model is proposed for thermal face images into realistic face images. The generator network of the proposed Transcoder-GAN model utilizes the stack of multi-head self-attention blocks in an encoder-decoder fashion.


Time: 14:40-16:00 (UTC+3)
Location:  Moscow, Pokrovsky boulevard, 11, R608 or via zoom
Working language: English