Auto ML

Published: January 21, 2019

Automated machine learning

AutoML Introduction

AutoML is a series of concepts and techniques used to automate machine learning modelling process that includes but not limited to data ingestion, preprocessing, data cleaning, feature engineering, model selection, regularization, hyperparameters tuning, optimization, outcome prediction and deployment; thereby reduce the human effort and make machine learning available to organisations and people with no major expertise in this field.

Auto ML Challenges

Divers tasks (classification ,regression, NLP, vision etc.)
Diverse dataset size
Limited computer resources
Varied data distribution (ranges)
Divers scoring metrics( accuracy, auc, mse, F1 etc.)
Class balance
Data anomalies (missing values, irrelevant variables etc.)

Project Description

The project pipelines phases of Data Science modelling namely,
- Data Fetch
- Data Explore
- Data Cleanup
- Feature Engineering
- Data Modelling
- Advance Modelling (voting/stacking/blending)
- Final Prediction
AutoML-Bots(Plexy) auto generate Notebook for Baseline Data Science Model

Project Features

Multiple tasks supports : Current version supports classification & regression tasks.
Multiple Model support: Trains data on total 13 model that includes 5 base models, 2 bagging models, 3 boosting models, 2 voting and 1 stacking model.
A Model-LeaderBoard : List metric efficiency achieved through various model and also cpu requirements for every trained model deployment.
Automates Data Cleaning & feature engineering: Bot deals with data discrepancies like missing values, multiple data categories, skewness in automated and optimised way.
Hyperparameter Tuning: Hyperparameter tuning with random search cv
OS Independent: Use existing libraries hence no set requires application can be deployed and work on any OS.

Part of the project is opensource and available here

Technologies & Tools

Python
sklearn, xgb, pycharm
Jupyter notebook

Future Work

Incorporate Meta-learning & Bayesian optimization
Pipeline suport for Deep Learning tasks( NAS- neural architecture search for NLP, vision tasks)
Reduce execution time
Deployment automation

Share on

Twitter Facebook LinkedIn

Kaustuv Kunal