Optical Character Recognition with F# and ML.NET
--
In this article, I’m going to build an app that recognizes handwritten digits from the famous MNIST machine learning dataset:
The MNIST challenge requires machine learning models to read images of handwritten digits and correctly predict which digit is visible in each image.
This may seem like an easy challenge, but look at this:
These are actual digits from the dataset. Are you able to identify each one? It probably won’t surprise you to hear that the human error rate on this exercise is about 2.5%.
There are 70,000 images of 28x28 pixels in the dataset. I’m going to use a truncated set of 5,000 images to speed up the training.
And I’ll build my app in F# with ML.NET and NET Core.
ML.NET is Microsoft’s new machine learning library. It can run linear regression, logistic classification, clustering, deep learning, and many other machine learning algorithms.
And F# is a perfect language for machine learning. It’s a 100% pure functional programming language based on OCaml and inspired by Python, Haskell, Scala, and Erlang. It has a powerful syntax and lots of built-in classes and functions for processing data.
The first thing I‘ll need for my app is a data file with images of handwritten digits. I will not use the original MNIST data because it’s stored in a nonstandard binary format.
Instead, I’ll use these excellent CSV files prepared by Daniel Dato on Kaggle.
There are 60,000 images in the training file and 10,000 in the test file. Each image is monochrome and resized to 28x28 pixels.
The training file looks like this:
It’s a CSV file with 785 columns:
- The first column contains the label. It tells us which one of the 10 possible digits is visible in the image.
- The next 784 columns…