Use F# and ML.NET to predict New York taxi fares

Mark Farragher
7 min readJul 21, 2020

Building machine learning apps has never been easier!

Because we have ML.NET, Microsoft’s new machine learning library. It can run linear regression, logistic classification, clustering, deep learning, and many other machine learning algorithms.

But did you know that the F# language is the perfect choice for developing machine learning applications with ML.NET?

The F# language is just perfect for machine learning. It’s a 100% pure functional programming language based on OCaml and inspired by Python, Haskell, Scala, and Erlang. It has a powerful syntax and lots of built-in classes and functions for processing data.

Check out the following F# code fragment that trains a machine learning model to predict taxi fares in New York city, and then uses the fully-trained model to predict a single trip for a passenger paying with a credit card.

Look how compact the syntax is. F# machine learning code is elegant, concise, and beautiful:

A nice feature of F# is that it supports Duck Typing. In many cases you can leave out class names or generic type names and the compiler will just figure them out on its own. You can see an example of that in the screenshot where I initialize taxiTripSample without having to specify a class name.

And check out the very cool pipe operator |> that allows me to create a chain of functions that operate on a data stream. In the screenshot, I use this feature to initialize metrics by piping a test dataset into the fully trained machine learning model, collect predictions, and then pipe those predictions into the Evaluate function to compute evaluation metrics. All in a single line of code!

With tricks like this, F# code is often very compact without sacrificing readability. On average my F# code is about 30% more compact than comparable C# code.

But let’s take a look at that taxi fare prediction case in more detail.

Did you know that the NYC Taxi & Limousine Commission keeps meticulous records of all taxi trips in the New York city area?

I’m going to grab their data file for December 2018. This is a CSV file with 8,173,233 records that looks like this: