Artificial intelligence system for fashion design image generation

Saverio Pulizzi
9 min readDec 27, 2020

Can fashion designers rely on artificial intelligence to generate new design of fashion garments?

Read this article to learn more about my journey on how I used generative adversarial networks to build a tool able to generate fashion design images.

Some example of design images generated by the system: https://schedio.pythonanywhere.com

This article represents the summary of my master degree’s research in computer engineering at Dublin City University.

This article is divided in 5 sections:

  • Introduction to the research
  • Generative adversarial networks applied to fashion design
  • AI powered web application (GITHUB message me for access)
  • Results
  • Conclusions

Introduction to the research

Generating a fashion design image with a fabric pattern and a sketch as digital input is a problem of image to image translation. There are many existing researches about conditional generative adversarial networks applied to the fashion industry but only few practical implementations of those can be found.

From a research point of view, FashionGAN and TextureGAN are among the existing generative models for fashion image generations conditioned on textures able to generate a relatively good quality output given a sketch and a texture image as an input. TextureGAN is the model used in this project to perform real-time fashion design image generation on a live web environment.

Besides offline fashion design image generation, there have been very few attempts of building practical implementations of real-time AI-assisted fashion design applications.

One of the most popular program has been created through Project Muze made by a joint effort between Google and Zalando. Project Muze was a browser based web application where the full outfit for a displayed 3D avatar was generated by the AI system conditioned on multiple user inputs representing general interests such as favourite lifestyle, music, colour, etc. This application presented many output limitations such as low variety, low user-output control, low quality as well as a high level of abstraction.

The solution proposed here attempts to overcome the mentioned weaknesses through an AI-driven web application where users can have a high level of output control and where output quality is fostered by ad-hoc UI input guidelines.

Generative adversarial networks applied to fashion design

A generative adversarial network (GAN) is a deep neural network architecture that is made up of a generator and a discriminator network. Those two networks interact and learn by trying to outwit each other during training.

On the one hand, the generator network uses existing data taken from a provided input distribution (or randomly generated vector of numbers, also known as latent space) to generate new data. In particular, the generator network G captures the input data distribution. For instance, when applied to computer vision tasks, the generator can generate new images or videos by using existing ones.

On the other hand, the discriminator network tries to differentiate between the real data (e.g. input image) and the data generated by the generator network. In particular, the discriminator network D estimates the probability that a sample came from the input data instead of G. During training, both networks provide feedback on the successful changes applied to their own processes (generation and discrimination) up to a point at which the discriminator network is no longer able to identify the difference between a new generated data and the real one, reaching an optimal state also known as Nash Equilibrium.

Generative adversarial networks equation.

Conditional generative adversarial networks

One of the major challenges of traditional GANs is that the generation process cannot be conditioned. Therefore, new generated data can be of any type and shape that are only determined by the underlying training data, excluding any influence from external sources. Conditional generative adver- sarial networks, or more simply cGANs, represent a class of deep generative models where the generation process can be conditioned by feeding additional information to the generator and discriminator networks during training. In cGANs G and D are conditioned on some extra information y which is fed to the network as an additional input layer. In particular, the input noise pz(z) and y are combined in a joint hidden representation and the mini-max game is represented by the following equation:

Conditional generative adversarial networks equation

In the following subsections are presented four of the most recent cGANs techniques that can be applied to image generations.

Bicycle GAN

The vast majority of conditional image generation models have been focusing on single image output while BicycleGAN focuses on producing multiple realistic and diverse outputs. The major task performed by the BicycleGAN is multi-modal image-to-image translation. This represent one of the major strength of this model as most of the available CGANs focus on a single image to image approach.

The methodology used to produce both realistic and diverse results consists in modelling a distribution of potential outputs in the target domain (corresponding to multiple images), given an input image from one domain. An example of the output that this model is capable to achieve is showed in the following image.

Output example of BicycleGAN in action.

Style GAN

The team of researchers who developed StyleGAN have been focusing on testing different architectures and losses with the goal of building a generative model that could generate both original and realistic fashion images deviating from the ones already appearing on the training set. In the StyleGAN model, a generator is trained to compute realistic images based on a mask input representing the shape of the fashion design image and a noise variable representing its style. One of the peculiarity of StyleGAN is its ability to sample different textures for the same shape by avoiding a deterministic mapping during the training of the generator and by introducing an additional l1 loss on the generator. An example of the output that this model is capable to achieve is showed in the following figure.

Style GAN in action.

Fashion GAN

FashionGAN is a generative architecture which takes inspiration from BicycleGAN but instead of producing multiple results from a single input, it takes two inputs (e.g. texture and shape) to generate just one output (process also called two-to-one mapping) among three image domains. One of the major architectural difference between FashionGAN and BicycleGAN is related to its encoding strategy. Indeed, to ensure that different types of textures (even the ones not contained in the training dataset) are correctly applied to the shape, texture images are encoded in a latent vector with the goal of training the network with a fabric pattern image together with the ground truth and the contour image. The following figure shows how the input texture and image shape are combined together to generate a final complete image.

Fashion GAN in action.

Texture GAN

Texture GAN is the conditional generative model selected for this project. This model is able to generate realistic images from input sketches with overlaid textures with a relatively higher accuracy when using different type of textures. The following figure shows the different results obtained by TextureGan on a single handbag sketch.

Results of handbags TextureGAN. On the far left side it is visible the ”ground truth” image from which the sketch was synthesized.

AI powered fashion design web application

Here we come to the juice!

The web application developed in this project works in two phases:

  1. The acquisition of the user input as the sketch and texture images.
  2. The generation and presentation of the final output as the generation of a completed 2D fashion design image.

We saved long working hours in collecting the dataset, training and setting the model since we used a pre-trained TextureGAN model that we downloaded from GitHub here.

Web technologies: In order to deploy a pre-trained model into a real-time web application, this project explored two major technologies: a web framework and a cloud environment. Flask is the web application framework used to build the web application and Python Anywhere is the cloud hosting environment. The figure below summarises how the two technologies are integrated together to perform the fashion design image generation task.

Real-time, ML-driven web application hosted on Python Anywhere and built with Flask.

Model Deployment: We deployed the model on python anywhere using the project structure as summarised in the figure below. The folders are explained as follow:

  • Templates: Folder where all the HTML files are stored
  • Static: For all the images used within the UI
  • Functions and Classes: Web app functions and classes performing the image transformations and generations are stored here
  • Env: This is our python virtual environment
  • Img: This is the main folder containing other subfolders where the input images provided by the user are saved
  • App.py: Represents the core file of this project. It contains the script to run the web application, including views and the main TextureGan function
  • Models: This is the project folder containing the pre-trained models for the image generation of fashion design images of garments and handbags
Project structure.

App.py is the engine of this web application. This file contains the functions called through the user interface based on user actions. The logic of this file is organised and implemented according to the following structure:

  • Sketch upload: A POST method is implemented to allow that image files (.jpg or .png) uploaded by users are saved into a temporary folder
  • Texture upload: The same logic as the one explained for the sketch upload is applied for texture images while the file is stored into a different folder
  • Image generation: This function is triggered by a UI button which, when clicked, it will call the TextureGan model conditioned on the two user inputs. The function will return the output image as an attachment

Finally, this project has been deployed on Python Anywhere where all the required libraries have been installed on their virtual environment.

The final result of this project is a real time AI-assisted fashion design web application that can be accessed at: https://schedio.pythonanywhere.com/

Results

The web application in question is able to generate finished fashion design images of shirts and handbags.

Web app screens accessible at: https://schedio.pythonanywhere.com. From top left to bottom right: homepage, sketch upload page, texture upload page, design creation page.

We decided to test the web application on MacBook Pro and iPhoneX, using a Safari browser. The main functionalities as well as the design image generation have been tested using 6 sketches (3 shirts and 3 handbags) and 3 different textures (stripes, regular and irregular pattern), previously unseen during training and downloaded from Google Images.

There are 5 different dimensions under which we tested the web app:

  • Portability. This is measuring how well the app is performing if accessed from different devices.
  • Reality of the output. This dimension is assessing how close to a real fashion design image one of our output is. In order to test this dimension, multiple trials have been run using a test set of different combinations of sketches and textures. The different combinations have been compared to each other.
  • Quality of the output. The quality of the output image has been assessed through a user survey linked to the web application.
  • Speed. This metric is used to evaluate how fast is the image generation process, which is calculated from the time the button to generate a new fashion design image is pressed until when the image is generated and provided as attachment to the end user.
  • Easiness. This indicator is used to evaluate the user experience in the eyes of end users.

Examples of generated images based on sketches and textures never seen by the model during training are provided in the figure below.

Testing of the web application with shirts and handbags fashion design image generation using 3 different types of fashion sketches and texture images.

Overall the results of our tests showed insufficient qualitative results.

Conclusions

This article has shown how TextureGAN could be embedded into a real-time web application (https://schedio.pythonanywhere.com/) and produce new fashion design images based on multiple user inputs. Despite a proven fast generation process, the output image, especially for irregular texture patterns, lacks of design quality. Because of this limitation further research, could be focused on finding a model that perform well also on non-regular texture images and/or on limiting textures to regular patterns only, selectable directly from the web app UI.

Thanks for reading this article, I hope you liked it!

Feel free to get in touch in one of those channels:

FOLLOW US: @AI.ARTS

--

--

Saverio Pulizzi

Data Scientist exploring the intersection between AI and Creativity.