Detect Spam Messages with C# And A CNTK Deep Neural Network
Spam is becoming a huge problem. Last year, 53.5% of all email traffic worldwide was due to spam messages, with the most common topics being healthcare and dating.
Imagine how much bandwidth and electricity is being wasted here!
So let’s see if we can solve this problem. In this article, I am going to build a C# app with CNTK and NET Core that can predict if any message is spam.
CNTK is Microsoft’s Cognitive Toolkit, a tensor library on par with TensorFlow. It can build, train, and run deep neural networks for regression, classification, and many other machine learning tasks.
And NET Core is the Microsoft multi-platform NET Framework that runs on Windows, OS/X, and Linux. It’s the future of cross-platform NET development.
The first thing I need is a data file with lots of messages, correctly labelled as being spam or not spam. I will use a collection of SMS messages collected by Caroline Tagg in her 2009 PhD thesis. This dataset has 5574 messages.
You can download the dataset here. You’ll need a Kaggle account to download. Save the file as spam.csv.
The file looks like this: