A Brief Introduction To AutoML Tools (Part 3 AutoGluon)

AutoGluonAutomated Machine LearningAutoML

This blog is the third part of my series of blogs “A brief introduction to AutoML Tools”. In the previous blog, I covered up everything about GAMA, so I recommend you to go through that blog first.


For comparison, I will be using the same dataset –  Framingham Heart Study. It has a considerable amount of missing values which will give you an idea of how these tools handle missing values. The goal is to compare all the three tools based on the following:

-> Accuracy Achieved 

-> Ease to understand and implement score

-> Time taken to complete the task


In this blog, I will be sharing my experience of implementing AutoGluon.

1. Brief Intro to AutoGluon

According to the official website of AutoGluon,


AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on deep learning and real-world applications spanning image, text, or tabular data. Intended for both ML beginners and experts, AutoGluon enables you to:


  • Quickly prototype deep learning solutions for your data with few lines of code.
  • Leverage automatic hyperparameter tuning, model selection/architecture search, and data processing.
  • Automatically utilize state-of-the-art deep learning techniques without expert knowledge.
  • Easily improve existing bespoke models and data pipelines, or customize AutoGluon for your use-case.


AutoGluon automated machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code,


you can train and deploy high-accuracy deep learning models on tabular, image, and text data.


It’s a complete package, it automates everything from data-preprocessing to prediction.


Compatibilities


AutoGluon requires Python version 3.6 or 3.7. Linux is the only operating system fully supported for now (complete Mac OSX and Windows versions will be available soon).


Installation


Considering you already have pip, installing AutoGluon involves two lines of code:

python3 -m pip install -U –pre “mxnet>=1.7.0b20200713, <2.0.0” -f https://sxjscience.github.io/KDD2020/
python3 -m pip install autogluon


For installation in MacOS please go through the official document.


So now let’s get to the fun part.


2. Importing the packages


AutoGluon provides automation for the following task:

  • Tabular Prediction (Classification and Regression)
  • Image Classification 
  • Object Detection
  • Text Prediction


And for each task, we need to import different packages.


First, we will have to import the autogluon library,

import autogluon as ag


For we are interested in Tabular Prediction, so we will import the following library:

from autogluon import TabularPrediction as task


Other libraries to be imported:

from sklearn.model_selection import train_test_split


3. Using AutoGluon to make predictions

Step 1: Getting the data and splitting it into train, test


Same as GAMA, we need to manually import the data and split it into train and test sets.

train,test = train_test_split(data, test_size=0.3)

Step 2: Create a AutoGluon Dataset Object and fit the training data

train_data = task.Dataset()
predictor = task.fit(train_data=train, label=’TenYearCHD’, eval_metric=”accuracy”)


Note that here we need to specify the label which we want to predict as an argument.


After running this code, the information we get is:

  • Number of rows and columns of the training data
  • The task which is predicted by AutoGluon (Binary or Multiclass Classification in our case).
  • Classes which are to be predicted. (1 or 0 in our case) 
  • List of different ML Algorithms that are being tested.
  • Total Runtime


Note: One thing I like about AutoGluon is that, if the predicted task is wrong, then we can explicitly specify the task as ‘problem_type’ argument in fit(). You can specify one the following task: [‘binary’, ‘multiclass’, ‘regression’]


Also, make sure your test data does not contain the target variable/label.

Step 3: Prediction


In this step, AutoGluon predicts test data based on the hyper-parameters and the model chosen in the above step.

y_pred = predictor.predict(test_data)


It returns a NumPy array of all the predictions made by the model.

Step 4: Evaluation of the model


It’s time to check out how AutoGluon has predicted in test data with some scores.

perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)


After running this code, the information we get is:

  • Accuracy of the model
  • Detailed per-class classification report
  • Balanced Accuracy Score, F1, etc.


That’s all we have to do to make predictions using AutoGulon, 


To get the detailed report of the performance of each model tested by AutoGulon,

results = predictor.fit_summary()


And to get the list of leaderboard,

leaderboard = predictor.leaderboard(test_data)

4. Conclusion of AutoGluon


a. AutoGluon also involves just 4 steps to complete the whole AutoML pipeline.


b. The average taken for data preprocessing, hyper-parameter selection and training is 7 mins and 30 sec.


I used Google Colab to run this tool so results vary according to which system you are using.


c. The accuracy achieved in default configuration is 84% to 85%.


d. Ease to understand and implement:


I will give this library a score of 9/10 for ease to understand and implement. This library is very easy to understand and implement and involves very less lines of code. The document provided has a detailed explanation of all the features available in this library. You can also find some blogs which also explains a lot about this library.


5. Pros and Cons of AutoGluon


The pros are

  • Automatic task identification i.e. Binary, Multi Classification, or Regression.
  • Minimal lines of code to implement complete Automated Machine Learning.
  • Neural Network is also included while searching for the best model.
  • We have the liberty to specify the task if the task which is predicted is wrong.
  • Achieves accuracy similar to other famous tools.
  • Includes deep learning automation tasks like Image Classification or Object Detection.
  • Includes Automated Neural Architecture Search.
  • Also supports auto model selection in PyTorch.


The cons are

  • Bit slower as compared to other tools.
  • Advanced topics like Custom AutoGluon, Neural Network Search are difficult to understand.