ML tools: A LITTLE LIBRARY FOR FASTER EXPERIMENTATION

MAKING EXPERIMENTATION A LOT FASTER AND EASIER TO KEEP TRACK OF

MM

Go get the library from my github!

Cleaning, transforming, splitting, oh my!

Starting to learn machine learning is tough. There are many steps involved in making a quality model and for a noob trying to wrap their head around it all can be daunting (it was for me!). I built this little library to modularize the process and make assembling a script a snap.

These are the modules I have built so far:

Transformers

The transformers.py module has a variety of transformers built with sci-kit learn's BaseEstimator and TransformerMixin. These classes allow us to build our own transformers that can be fed into a an sci-kit learn Pipeline(). These are used by the data_prep.py module to load in data and transform it. You can also import these on their own to transform any data on the fly.

  • DataFrameImputer(): fills in missing values in a pandas dataframe with the most frequent value for categorical columns, mean or median for numerical columns.
  • FactorFeatures(): multiplies specified columns for quadratic interactions
  • Dummifier(): creates one hot encodings for categorical columns of a pandas dataframe
  • DataFrameSelector(): takes a pandas dataframe and transforms it into a numpy array

Data Prep

The dataprep.py module contains classes to load and transform train and test data and create various pipelines.

  • Pipes(): creates various pipelines that use transformers from the transformers.py module to transform data.
  • DataPrep(): loads train and test data and transforms it using selected Pipes() method. It returns a training set, a validation set, and a test set.

Helpers

The helpers.py module contains various helper functions used in the transformers.py and the dataprep.py modules. You can load these individually to perform small tasks.

This library is just beginning to grow. After some more studying I'll be filling it up with even more useful tools. Feel free to clone the repo and add your own functions and classes!