Detect Movie Review Sentiment With C# And A 1D-ConvNet

Mark Farragher
9 min readDec 2, 2019

--

In this article I’m going to build an app that can automatically detect the sentiment of IMDB movie reviews.

The first thing I’ll need is a dataset with thousands of movie reviews, correctly labelled as having positive of negative sentiment.

The Kaggle IMDB dataset has exactly what I need. It’s a collection of 50,000 highly polarized movie reviews with exactly 50% positive and 50% negative reviews. My job is to build an app that reads the dataset and correctly predict the sentiment of each review.

I’ll download the IMDB Movie Dataset and save the ZIP file in the project folder that I’m going to create in a few minutes.

The movie reviews look like this:

Sweet sweet movie data…

You may have noticed that the datafiles in the zip archive are not text files but binary files, this is because the movie reviews have already been preprocessed. Each word in the reviews has been converted to an index number in a dictionary, and the words have been sorted in reverse order and padded with zeroes so each review is exactly 500 numbers long.

I’m going to build a 1-dimensional convolutional network that reads in these 500-word sequences and then makes a prediction for each review if it is positive or negative.

Let’s get started. I will need to build a new application from scratch by opening a terminal and creating a new NET Core console project:

$ dotnet new console -o HotdogNotHotdog
$ cd HotdogNotHotdog

I also need to make sure to copy the dataset file IMDB Dataset.csv into this folder because the code I’m going to type next will expect it here.

Now I’ll install the following packages

$ dotnet add package CNTK.GPU
$ dotnet add package XPlot.Plotly
$ dotnet add package Fsharp.Core

The CNTK.GPU library is Microsoft’s Cognitive Toolkit that can train and run deep neural networks. And Xplot.Plotly is an awesome plotting library based on Plotly. The library is designed for F# so I also need to pull in the Fsharp.Core library.

The CNTK.GPU package will train and run deep neural networks using my GPU. I’ll need an NVidia GPU and Cuda graphics drivers for this to work.

If you don’t have an NVidia GPU or suitable drivers, the library will fall back and use the CPU instead. This will work but training neural networks will take significantly longer.

CNTK is a low-level tensor library for building, training, and running deep neural networks. The code to build deep neural network can get a bit verbose, so I’ve developed a little wrapper called CNTKUtil that will help you write code faster.

I’ll download the CNTKUtil files in a new CNTKUtil folder at the same level as my project folder.

Then I need to make sure I’m in the console project folder and create a project reference like this:

$ dotnet add reference ..\CNTKUtil\CNTKUtil.csproj

Now I am ready to start writing code. I’ll edit the Program.cs file with Visual Studio Code and add the following code:

The code first checks the active compute device in NetUtil.CurrentDevice and writes it to the console so I can make sure that CNTK is using my GPU. Then the code calls File.Exists and ZipFile.ExtractToDirectory to extract the dataset files from the zipfile if that hasn’t been done yet. Then we call DataUtil.LoadBinary to load to load the training and testing data in memory. Note the sequenceLength variable that indicates that we’re working with movie reviews that have been padded to a length of 500 words.

We now have 25,000 movie reviews ready for training and 25,000 movie reviews ready for testing. Each review has been encoded with each word converted into a numerical dictionary index, and the reviews have been padded with zeroes so that they’re all 500 floats long.

Now I need to tell CNTK what shape the input data has that we’ll train the neural network on, and what shape the output data of the neural network will have:

The input to the neural network is the entire 500-word sequence of a movie review. So the first Var method tells CNTK that our neural network will use a 1-dimensional tensor of sequenceLength float values as input.

And the second Var method tells CNTK that we want our neural network to output a single float value which is the probability that the movie review is positive.

My next step is to design the neural network. We’re going to build the following network:

A simple 1D-ConvNet for sentiment detection

This network uses two 1-dimensional convolutional layers, each followed by a pooling layer to reduce the width and height of the output tensor. Each convolutional layer uses a filter with a depth of 7 to process seven subsequent words in a movie review.

So with this setup we are working with a dictionary of 5000 unique words (represented by the size of the input data) and a 1D-convolutional neural network that can process groups of 7 words to detect sentiment.

I will use a single dense layer as the classifier with Sigmoid activation.

Here’s how to build this neural network:

Note how I’m first calling OneHotOp to convert each word into a one-hot encoded vector with 10,000 elements. I then call Embedding to embed these values in a 128-dimensional space. The final call to TransposeAxes rotates the tensor so that the words, which are originally stacked in the width direction, are now stacked in the depth direction. This allows the 1D convolution kernels to process groups of words.

Each Convolution1D call adds a new 1-dimensional convolution layer to the network. Each convolution filter has 32 channels and uses a kernel depth of 7.

I’m stacking two layers, both using ReLU activation, and then add a final layer with a single node using Sigmoid activation.

Then I use the ToSummary method to output a description of the architecture of the neural network to the console.

Now we need to decide which loss function to use to train the neural network, and how we are going to track the prediction error of the network during each training epoch.

For this assignment I’ll use BinaryCrossEntropy as the loss function because it’s the standard metric for measuring binary classification loss.

I will track the error with the BinaryClassificationError metric. This is the number of times (expressed as a percentage) that the model predictions are wrong. An error of 0 means the predictions are correct all the time, and an error of 1 means the predictions are wrong all the time.

Next we need to decide which algorithm to use to train the neural network. There are many possible algorithms derived from Gradient Descent that I can use here.

For this assignment I am going to use the AdamLearner. You can learn more about the Adam algorithm here: https://machinelearningmastery.com/adam...

These configuration values are a good starting point for many machine learning scenarios, but you can tweak them if you like to try and improve the quality of your predictions.

I’m almost ready to train. My final step is to set up a trainer and an evaluator for calculating the loss and the error during each training epoch:

The GetTrainer method sets up a trainer which will track the loss and the error for the training partition. And GetEvaluator will set up an evaluator that tracks the error in the test partition.

Now we’re finally ready to start training the neural network!

I will add the following code:

I’m training the network for 10 epochs using a batch size of 16. During training I’ll track the loss and errors in the loss, trainingError and testingError arrays.

Once training is done, I show the final testing error on the console. This is the percentage of mistakes the network makes when predicting review sentiment.

Note that the error and the accuracy are related: accuracy = 1 — error. So I am also reporting the final accuracy of the neural network.

Here’s the code to train the neural network. I’ll put this inside the for loop:

The Index().Shuffle().Batch() sequence randomizes the data and splits it up in a collection of 16-record batches. The second argument to Batch() is a function that will be called for every batch.

Inside the batch function I call GetBatch twice to get a feature batch and a corresponding label batch. Then I call TrainBatch to train the neural network on these two batches of training data.

The TrainBatch method returns the loss and error, but only for training on the 16-record batch. So I simply add up all these values and divide them by the number of batches in the dataset. That gives me the average loss and error for the predictions on the training partition during the current epoch, and I report this to the console.

So now we know the training loss and error for one single training epoch. The next step is to test the network by making predictions about the data in the testing partition and calculate the testing error.

I’ll put this code inside the epoch loop and right below the training code:

I don’t need to shuffle the data for testing, so now I can call Batch directly. Again I’m calling GetBatch to get feature and label batches, but note that I am now providing the testing_data and testing_labels arrays.

I call TestBatch to test the neural network on the 16-record test batch. The method returns the error for the batch, and I again add up the errors for each batch and divide by the number of batches.

That gives me the average error in the neural network predictions on the test partition for this epoch.

After training completes, the training and testing errors for each epoch will be available in the trainingError and testingError arrays. Let’s use XPlot to create a nice plot of the two error curves so we can check for overfitting:

This code creates a Plot with two Scatter graphs. The first one plots 1 — trainingError which is the training accuracy, and the second one plots 1 — testingError which is the testing accuracy.

Finally I use File.WriteAllText to write the plot to disk as a HTML file.

I’m now ready to build the app. I’ll navigate to the CNTKUtil folder and type the following:

$ dotnet build -o bin/Debug/netcoreapp3.0 -p:Platform=x64

This will build the CNKTUtil project. Note how I’m specifying the x64 platform because the CNTK library requires a 64-bit build.

Now I’ll navigate back to the MovieSentiment folder and type:

$ dotnet build -o bin/Debug/netcoreapp3.0 -p:Platform=x64

This will build the app. Note how I’m again specifying the x64 platform.

Now I can run the app:

$ dotnet run

The app will create the neural network, load the dataset, train the network on the data, and create a plot of the training and testing errors for each epoch.

Here’s what the app looks like running on my laptop:

Training the neural network in PowerShell

Note the compute device, CNTK has correctly detected my NVidia GeForce GTX 1060 graphics adapter and is using it to train the neural network.

Also note that the convolutional neural network is quite large with over 1.3 million trainable parameters!

The app will write the training plot to disk in a new file called chart.html and it looks like this:

Training and testing accuracy per epoch. Note the overfitting…

Check it out, that’s definitely overfitting. The training accuracy creeps towards 1 while the testing accuracy actually decreases a little.

My final accuracy is 0.99 on training and 0.86 on testing. This means that the network can correctly identify the sentiment in 86 out of 100 movie reviews.

That’s a pretty great result, but since the network is overfitting I can’t really train it any longer to improve the accuracy. I get my best testing accuracy at epoch 2 and it goes all downhill from there.

Can you improve this app to remove the overfitting and get a better final accuracy?

This article is based on a homework assignment from my machine learning course: Deep Learning with C# and CNTK.

--

--