Artistic Style Transfer With C# And A Neural Network

Mark Farragher
9 min readDec 12, 2019

Style transfer is a process where we recompose an image in the style of another image by transferring the artistic style from one picture to another using a convolutional neural network.

It looks like this:

Visiting Picasso with dynamic style transfer

This is a photo of Pablo Picasso painting a bull on a sheet of glass, but the image has been repainted by a neural network using the artistic style of another painting.

You can watch the full video here: https://www.youtube.com/watch?v=FzvTLEB_3KY

Pretty cool, right?

This is a short fragment from Visite à Picasso, a 1950 film by Belgian filmmaker Paul Haesaerts, in which Picasso demonstrates his skills.

Every frame of the video has been processed by an Artistic Style Transfer neural network. The result is a hypnotic and dreamlike sequence where we see Picasso painting his bull, but the visual style keeps changing as the software switches seamlessly between different artistic styles.

So at its core, style transfer is a process that takes an image which we call the Content Image, and combines it with the artistic style of a second image called the Style Image. This produces the Mixed Image:

The artistic style of the middle image gets applied to the content image

You can see that the original landscape is still visible in the mixed image, but the colors, textures, and visual style are all taken from the style image.

To perform artistic style transfer, the first thing we’re going to need is a fully trained image classifier. A popular choice is the VGG19 convolutional neural network. We can download this network from the Internet and load it in an app.

So here’s how the style transfer process works.

We start by showing the content image to the neural network and measuring the feature activation in the deepest convolution layer of the neural network. We’ll treat this level of activation as a baseline. When we show other images to the network, we’ll get a different feature activation in the deepest layer. The difference between the baseline and the actual activation level is called the Content Loss. It determines how much of the features of the content image are visible.

We can also measure the artistic style by showing the style image to the network and then calculating the Gramm Matrix of the feature activation in the deepest convolution layer. This creates a baseline for the style.

When we show the network a random image, the difference between the style baseline and the Gramm Matrix of the actual activation level is called the Style Loss. It determines how much of the artistic style from the style image is visible.

The complete training process is now very simple:

  1. Show the content and style images to the neural network and calculate the content and style baselines.
  2. Show a random image to the neural network and calculate the content loss and style loss.
  3. Tweak the pixels in the random image to reduce the content loss and style loss. Repeat until the loss is acceptable.

We can show random images to the neural network by adding a Dreaming Layer. This is simply a dense input layer with three network nodes per image pixel (one for each color channel). During training, these nodes are tweaked which effectively creates an image that minimizes the content and style loss.

The process looks like this:

Keep tweaking the target image until the content loss and style loss are minimal

Let’s see if I can build a style transfer app in C# with Microsoft’s Cognitive Toolkit library.

I’ll start by building a new console application from scratch with NET Core:

$ dotnet new console -o StyleTransferDemo

Now I’ll install the CNTK package:

$ dotnet add package CNTK.Gpu

The CNTK.GPU library is Microsoft’s Cognitive Toolkit that can train and run deep neural networks. It will train and run deep neural networks using your GPU. You’ll need an NVidia GPU and Cuda graphics drivers for this to work.

If you don’t have an NVidia GPU or suitable drivers, the library will fall back and use the CPU instead. This will work but training neural networks will take significantly longer.

CNTK is a low-level tensor library for building, training, and running deep neural networks. The code to build deep neural network can get a bit verbose, so I’ve developed a little wrapper called CNTKUtil that will help me write code faster.

You can download the CNTKUtil files here and save them in a new CNTKUtil folder at the same level as the project folder.

From the console project folder I can now create a project reference like this:

$ dotnet add reference ../CNTKUtil/CNTKUtil.csproj

Now I’m ready to start writing code. I’ll edit the Program.cs file with Visual Studio Code and add the following code:

The code calls NetUtil.CurrentDevice to display the compute device that will be used to train the neural network.

Then I use the StyleTransfer helper class and call LoadImage twice to load the content image and the style image.

Now I need to tell CNTK what shape the input data has that I’ll train the neural network on:

I am training the neural network with a dreaming layer which has the exact same width and height as the content and style images. So my input tensor is imageWidth times imageHeight times 3 color channels in size, and each pixel channel is a float that can be individually trained.

My next step is to design the neural network. I’m going to use the VGG19 network but only keep the convolutional layers for detecting content and style loss:

Note how I’m first calling VGG19 to load the complete VGG19 network and freeze all layers. I then call StyleTransferBase which will remove the classifier and only keep the convolutional base for style transfer.

Next I need to set up the labels to train the neural network on. These labels are the feature activation and Gramm Matrix values in the content and style layers of the neural network when I show it the content respectively the style image:

Calculating the labels from the model and the content and style images is a complex operation, but fortunately there’s a handy method called CalculateLabels that does it all automatically. The result is a float[][] array that contains the desired activation levels in the content and style layers that will let the neural network know that style transfer has been achieved.

The neural network is almost done. All I need to add is a dreaming layer to generate the mixed image:

The dreaming layer is an input layer for the neural network that represents an image where every pixel is an individually trainable parameter. During the training process, the pixel colors in the dreaming layer will change in order to produce the mixed image.

Next I need to tell CNTK what shape the output tensor of the neural network will have. This shape is a bit complex because I am looking at feature activation and Gramm Matrix values in the content and style layers of the neural network. But I can programmatically calculate the shape like this:

This code calls GetContentAndStyleLayers to access the content and style layers in the VGG19 network, loops over all labels in the labels array, and constructs an array of CNTK variables with the correct Shape value.

Now I need to set up the loss function to use to train the neural network. This loss function needs to measure the feature activation and Gramm Matrix values in the content and style layers of the neural network, and compare them to the baseline activation and Gramm Matrix values when the network is looking at the content and the style images:

The loss function for style transfer is quite complex, but fortunately I can set it up with a single call to CreateLossFunction by providing the model, the content and style layers, and the CNTK label variable.

Next I need to decide which algorithm to use to train the neural network. There are many possible algorithms derived from Gradient Descent that I can use here.

I’m going to use the AdamLearner. You can learn more about the Adam algorithm here: https://machinelearningmastery.com/adam...

I am almost ready to train. My final step is to set up a trainer for calculating the loss during each training epoch:

The GetTrainer method sets up a trainer which will track the loss during the style transfer process.

Now I’m finally ready to start training the neural network!

I’ll add the following code:

I am training the network for 300 epochs using a training batch set up by the CreateBatch method. The TrainMiniBatch method trains the neural network for one single epoch. And every 50 epochs I display the loss by calling the PreviousMinibarchLossAverage method.

The neural network is now fully trained and the style and content loss is minimal. I now need to extract the image from the neural network:

This code sets up an evaluation batch with CreateBatch. Normally I would evaluate the neural network on this batch and create predictions for the labels. But since the image I’m interested in is actually stored in the dreaming layer, I can extract it directly from the batch with a call to InferImage.

I now have the value for each pixel in a float[] array, so I call the Mat constructor to project these values to an 8-bit 3-channel color image and call the ImShow method to render the image on screen.

Note that Mat and ImShow are OpenCV features. OpenCV is a flexible image library used by CNTKUtil to implement style transfer.

Finally I call WaitKey so the image remains on screen when the app completes, and I have time to admire the style transfer results.

I’m now ready to run the app. I’ll navigate to the CNTKUtil folder first and compile the project:

$ dotnet build -o bin/Debug/netcoreapp3.0 -p:Platform=x64

Note how I’m specifying the x64 platform because the CNTK library requires a 64-bit build.

Now I can do the same in the StyleTransferDemo folder:

$ dotnet build -o bin/Debug/netcoreapp3.0 -p:Platform=x64

This will build my app. Note how I’m again specifying the x64 platform.

Now I can run the app:

$ dotnet run

The app will load the VGG19 neural network, prep it for style transfer, add a dreaming layer, load the content and style images, train the network for 300 epochs, and continuously tweak the dreaming layer until the content and style losses are minimal.

Check this out, here’s my first run:

My profile picture in cubist style

The content image is my profile picture, visible at the top right. The style image is a famous cubist painting by Lyubov Popova which is visible at the bottom right. The generated mixed image is on the left.

This looks great! The neural network has created a nice mix of my profile picture and the cubist style of Popova’s painting.

Let’s try that again:

My profile picture in Van Gogh style

Do you recognize the style image? It’s Vincent van Gogh’s famous Starry Night painting. The neural network has repainted my profile picture using his famous trippy blue swirls and yellow highlights.

Okay, one more:

My profile picture in Edvard Munch’s style

Now I’m using Edvard Munch’s famous painting The Scream as the style image. What do you think of the result?

So these are my results. Feel free to use the code and generate your own style images with C# and CNTK.

This article is based on a homework assignment from my machine learning course: Deep Learning with C# and CNTK.

--

--