In this article I’m going to build an app that can automatically detect the sentiment of IMDB movie reviews.
The first thing I’ll need is a dataset with thousands of movie reviews, correctly labelled as having positive of negative sentiment.
The Kaggle IMDB dataset has exactly what I need. It’s a collection of 50,000 highly polarized movie reviews with exactly 50% positive and 50% negative reviews. My job is to build an app that reads the dataset and correctly predict the sentiment of each review.
I’ll download the IMDB Movie Dataset and save the ZIP file in the project folder that I’m going to create in a few minutes.
The movie reviews look like this:
You may have noticed that the datafiles in the zip archive are not text files but binary files, this is because the movie reviews have already been preprocessed. Each word in the reviews has been converted to an index number in a dictionary, and the words have been sorted in reverse order and padded with zeroes so each review is exactly 500 numbers long.
I’m going to build a 1-dimensional convolutional network that reads in these 500-word sequences and then makes a prediction for each review if it is positive or negative.
Let’s get started. I will need to build a new application from scratch by opening a terminal and creating a new NET Core console project:
$ dotnet new console -o HotdogNotHotdog
$ cd HotdogNotHotdog
I also need to make sure to copy the dataset file IMDB Dataset.csv into this folder because the code I’m going to type next will expect it here.
Now I’ll install the following packages
$ dotnet add package CNTK.GPU
$ dotnet add package XPlot.Plotly
$ dotnet add package Fsharp.Core
The CNTK.GPU library is Microsoft’s Cognitive Toolkit that can train and run deep neural networks. And Xplot.Plotly is an awesome plotting library based on Plotly. The library is designed for F# so I also need to pull in the Fsharp.Core library.