Detect Spam Messages with C# And A CNTK Deep Neural Network
Spam is becoming a huge problem. Last year, 53.5% of all email traffic worldwide was due to spam messages, with the most common topics being healthcare and dating.
Imagine how much bandwidth and electricity is being wasted here!
So let’s see if we can solve this problem. In this article, I am going to build a C# app with CNTK and NET Core that can predict if any message is spam.
CNTK is Microsoft’s Cognitive Toolkit, a tensor library on par with TensorFlow. It can build, train, and run deep neural networks for regression, classification, and many other machine learning tasks.
And NET Core is the Microsoft multi-platform NET Framework that runs on Windows, OS/X, and Linux. It’s the future of cross-platform NET development.
The first thing I need is a data file with lots of messages, correctly labelled as being spam or not spam. I will use a collection of SMS messages collected by Caroline Tagg in her 2009 PhD thesis. This dataset has 5574 messages.
You can download the dataset here. You’ll need a Kaggle account to download. Save the file as spam.csv.
The file looks like this:
It’s a TSV file with only 2 columns of information:
- Label: ‘spam’ for a spam message and ‘ham’ for a normal message.
- Message: the full text of the SMS message.
I will build a binary classification network that reads in all messages and then makes a prediction for each message if it is spam or ham.
Let’s get started. Here’s how to set up a new console project in NET Core:
$ dotnet new console -o SpamDetection
$ cd SpamDetection
Next, I need to install required packages:
$ dotnet add package Microsoft.ML
$ dotnet add package CNTK.GPU
$ dotnet add package XPlot.Plotly
$ dotnet add package Fsharp.Core
Microsoft.ML is the Microsoft machine learning package. We will use to load and process the data from the…