Modeling new shoes

Insights

Modeling new shoes

Using machine learning to design a new pair of shoes

By

Doug Cook

26

Sep

2024

At thirteen23, we’re constantly pushing the boundaries of design and technology to create exceptional digital experiences for our clients. With the advent of generative AI, we’ve begun to explore how these tools will change our own approach to creating products and services.

Generative AI (GenAI) refers to unsupervised and semi-supervised machine learning algorithms that can create a wide variety of data, such as images, video, audio, text, and even code. It does this by learning patterns from existing data and then using them to generate new and unique outputs.

Recent breakthroughs have greatly expanded the capabilities of GenAI, enabling the creation of highly realistic and detailed content. Some of the most recent examples of this technology include text-to-image models, which combine a large language model with a generative image model to transform text into images based on natural language.

The most popular of these models have been trained on massive amounts of images collected from the web. These advances have opened up new opportunities for GenAI to play a significant role in both the ideation and creation of new experiences.

Creating our own image model

Inspired by our conversations with local enthusiast site NiceKicks and all the joyscrolling we did during the Olympics, we wondered if we couldn’t use machine learning to generate our own pair of nice kicks. But we didn't want to just jump into Midjourney, Gemini, or Dall-E. Instead, we wanted to try our hand at creating our own image model, one that would allow us to generate a near infinite number of designs.

Our first step was to create a collection of training data. We chose to use both existing shoe designs and a set of synthetic designs created using some of the more popular image models. The hope was that adding an additional set of synthetic data would help us create a more versatile model that would yield increasingly unique designs.

training data showing many shoes

Training our model

Dataset in hand, we set out to create our own Generative Adversarial Network, or GAN, using the TensorFlow implementation of StyleGAN. That’s a mouthful, but wait. GANs allow us to create generative models through supervised learning using two submodels: a generator (“the creator”) that learns to create images that look real and a discriminator (“the critic”) that learns to distinguish real images from fakes.


By pitting these two models against each other, the generator model creates new images while the discriminator model classifies each new image as either real or fake. The two models are trained simultaneously until the discriminator model is fooled by the generator model (i.e., the generator model is able to generate realistic-looking images).

Fine-tuning without recreating the wheel

While creating GANs is a great way to get started, we quickly realized we needed more power to achieve higher-resolution images. As a result, we switched to refining an open-source diffusion model using LoRA (Low-Rank Adaptation). Unlike GANs, diffusion models generate images by gradually denoising a random input, rather than having two networks working against each other.

LoRA is a training technique designed to fine-tune diffusion models, such as Stable Diffusion and Flux. With this approach, we were able to refine a high-quality, pre-trained model that produced a much stronger set of renders. It also enabled support for more complex text prompts, such as "sneaker made from green leaves and moss" or "shoe inspired by Japanese gardens, clear acrylic."

This added versatility not only improved the visual quality of our generated images but also allowed for more nuanced and detailed outputs.

Exploring latent space

Once our model was trained, we were able to generate new images either by prompting or by randomly sampling points in the model’s "latent space." In GANs, a latent space is a multidimensional representation of compressed data. However, in diffusion models, it’s more accurately described as a high-dimensional noise space, from which the model gradually refines an image through a step-by-step denoising process.

This noise space contains all the important information and features needed to represent and ultimately generate images similar to our original data set. By navigating this space, we can create a variety of new images that exhibit the characteristics of the training data while introducing new combinations and variations.

In our example, we tend to think of this space as representing the quality of "shoeness"—all the patterns, rules, and conditions needed to create a realistic-looking shoe. It’s this representation that allows us to generate our own unique kicks.

four shoes collaged

Similarly, a “latent walk” involves smoothly interpolating between two points in the latent space, generating intermediate states that blend characteristics of both endpoints. So, in addition to generating single images, we can also create sequences that combine features from different points in the latent space, resulting in something like this

What’s this have to do with UX?

While this is just a image model, imagine if we instead used generative AI to generate UI patterns that infer their context from a defined set of interactions and behaviors.

Such a model would allow us to generate actual UI screens and even running code. That may sound more involved, but with the latest language models, it’s entirely possible. Stay tuned!

Have an idea or want to learn more? Subscribe to our newsletter and follow us on LinkedIn!

Special thanks to Natalie Vanderveen and Morgan Gerber

Doug Cook

Doug Cook

FOUNDER AND PRINCIPAL

Doug is the founder of thirteen23. When he’s not providing strategic creative leadership on our client engagements, he can be found practicing the time-honored art of getting out of the way.

Around the studio