Auto ML
Published:
Automated machine learning
AutoML Introduction
AutoML is a series of concepts and techniques used to automate machine learning modelling process that includes but not limited to data ingestion, preprocessing, data cleaning, feature engineering, model selection, regularization, hyperparameters tuning, optimization, outcome prediction and deployment; thereby reduce the human effort and make machine learning available to organisations and people with no major expertise in this field.
Auto ML Challenges
- Divers tasks (classification ,regression, NLP, vision etc.)
- Diverse dataset size
- Limited computer resources
- Varied data distribution (ranges)
- Divers scoring metrics( accuracy, auc, mse, F1 etc.)
- Class balance
- Data anomalies (missing values, irrelevant variables etc.)
Project Description
- The project pipelines phases of Data Science modelling namely,
- Data Fetch
- Data Explore
- Data Cleanup
- Feature Engineering
- Data Modelling
- Advance Modelling (voting/stacking/blending)
- Final Prediction
- AutoML-Bots(Plexy) auto generate Notebook for Baseline Data Science Model
Project Features
Multiple tasks supports : Current version supports classification & regression tasks.
Multiple Model support: Trains data on total 13 model that includes 5 base models, 2 bagging models, 3 boosting models, 2 voting and 1 stacking model.
A Model-LeaderBoard : List metric efficiency achieved through various model and also cpu requirements for every trained model deployment.
Automates Data Cleaning & feature engineering: Bot deals with data discrepancies like missing values, multiple data categories, skewness in automated and optimised way.
Hyperparameter Tuning: Hyperparameter tuning with random search cv
OS Independent: Use existing libraries hence no set requires application can be deployed and work on any OS.
Part of the project is opensource and available here
Technologies & Tools
- Python
- sklearn, xgb, pycharm
- Jupyter notebook
Future Work
- Incorporate Meta-learning & Bayesian optimization
- Pipeline suport for Deep Learning tasks( NAS- neural architecture search for NLP, vision tasks)
- Reduce execution time
- Deployment automation