Use Python, NimbusML and ML.NET to predict New York taxi fares

Mark Farragher
6 min readJul 28, 2020

There are many popular machine learning libraries for Python. There’s TensorFlow, scikit-learn, Theano, Caffe, and many others.

And in the NET domain we have Microsoft’s new ML.NET machine learning library which can be used in C# and F# applications.

But now Microsoft has created NimbusML, a new library that will let you access the ML.NET machine learning library directly in your Python code!

NimbusML acts as a bridge between the Python process that’s running your app code and the dotNET runtime that’s hosting the ML.NET library. All calls are transparently routed between Python and dotNET.

Naturally I had to try it out. I decided to port my New York taxi price prediction model to NimbusML to see what happens.

I’m always big on writing extremely compact apps and I was happy to get the C# version of the taxi price predictor down to 122 lines of code. And by porting the app to F#, I managed to reduce its size even further to only 69 lines of code.

But how compact will the Python app be?

Let’s find out.

The first thing I’ll need is a data file with transcripts of New York taxi rides. The NYC Taxi & Limousine Commission provides yearly TLC Trip Record Data files which have exactly what I need.

I will download the Yellow Taxi Trip Records from December 2018 and save it as yellow_tripdata_2018–12.csv.

This is a CSV file with 8,173,233 records that looks like this:

There are a lot of columns with interesting information in this data file, but I will only train on the following:

  • Column 0: The data provider vendor ID
  • Column 3: Number of passengers
  • Column 4: Trip distance
  • Column 5: The rate code (standard, JFK, Newark, …)
  • Column 9: Payment type (credit card, cash, …)
  • Column 10: Fare amount