Implementing ML model in AutoML
Implement Machine Learning Model using AutoML:
Install h2o module to use the AutoML.
!pip install h2o
Collecting h2o
[?25l Downloading https://files.pythonhosted.org/packages/f5/4a/e24acf8729af20384a1788e97b39b016be4bbf46a0bb475038f1fee97260/h2o-3.30.0.7.tar.gz (128.8MB)
[K |████████████████████████████████| 128.8MB 84kB/s
[?25hRequirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from h2o) (2.23.0)
Requirement already satisfied: tabulate in /usr/local/lib/python3.6/dist-packages (from h2o) (0.8.7)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from h2o) (0.16.0)
Collecting colorama>=0.3.8
Downloading https://files.pythonhosted.org/packages/c9/dc/45cdef1b4d119eb96316b3117e6d5708a08029992b2fee2c143c7a0a5cc5/colorama-0.4.3-py2.py3-none-any.whl
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->h2o) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->h2o) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->h2o) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->h2o) (2020.6.20)
Building wheels for collected packages: h2o
Building wheel for h2o (setup.py) ... [?25l[?25hdone
Created wheel for h2o: filename=h2o-3.30.0.7-py2.py3-none-any.whl size=128865965 sha256=73528d7a6beb2b647c8ea501e4fec0ade3a5f9fda31be352aab1679483d59b99
Stored in directory: /root/.cache/pip/wheels/a6/c2/6d/9612d426d2c947be23a8cd2d0156a9107927de630b8821ecea
Successfully built h2o
Installing collected packages: colorama, h2o
Successfully installed colorama-0.4.3 h2o-3.30.0.7
Import the h2o Python module and H2OAutoML class and initialize a local H2O cluster.
import h2o
from h2o.automl import H2OAutoML
h2o.init()
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
Java Version: openjdk version "11.0.7" 2020-04-14; OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04); OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)
Starting server from /usr/local/lib/python3.6/dist-packages/h2o/backend/bin/h2o.jar
Ice root: /tmp/tmpaw9v2fg3
JVM stdout: /tmp/tmpaw9v2fg3/h2o_unknownUser_started_from_python.out
JVM stderr: /tmp/tmpaw9v2fg3/h2o_unknownUser_started_from_python.err
Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
H2O_cluster_uptime: | 02 secs |
H2O_cluster_timezone: | Etc/UTC |
H2O_data_parsing_timezone: | UTC |
H2O_cluster_version: | 3.30.0.7 |
H2O_cluster_version_age: | 6 hours and 5 minutes |
H2O_cluster_name: | H2O_from_python_unknownUser_qrzuv8 |
H2O_cluster_total_nodes: | 1 |
H2O_cluster_free_memory: | 3.180 Gb |
H2O_cluster_total_cores: | 2 |
H2O_cluster_allowed_cores: | 2 |
H2O_cluster_status: | accepting new members, healthy |
H2O_connection_url: | http://127.0.0.1:54321 |
H2O_connection_proxy: | {"http": null, "https": null} |
H2O_internal_security: | False |
H2O_API_Extensions: | Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4 |
Python_version: | 3.6.9 final |
data.head()
sku | national_inv | lead_time | in_transit_qty | forecast_3_month | forecast_6_month | forecast_9_month | sales_1_month | sales_3_month | sales_6_month | sales_9_month | min_bank | potential_issue | pieces_past_due | perf_6_month_avg | perf_12_month_avg | local_bo_qty | deck_risk | oe_constraint | ppap_risk | stop_auto_buy | rev_stop | went_on_backorder |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.11312e+06 | 0 | 8 | 1 | 6 | 6 | 6 | 0 | 4 | 9 | 12 | 0 | No | 1 | 0.9 | 0.89 | 0 | No | No | No | Yes | No | Yes |
1.11327e+06 | 0 | 8 | 0 | 2 | 3 | 4 | 1 | 2 | 3 | 3 | 0 | No | 0 | 0.96 | 0.97 | 0 | No | No | No | Yes | No | Yes |
1.11387e+06 | 20 | 2 | 0 | 45 | 99 | 153 | 16 | 42 | 80 | 111 | 10 | No | 0 | 0.81 | 0.88 | 0 | No | No | No | Yes | No | Yes |
1.11422e+06 | 0 | 8 | 0 | 9 | 14 | 21 | 5 | 17 | 36 | 43 | 0 | No | 0 | 0.96 | 0.98 | 0 | No | No | No | Yes | No | Yes |
1.11482e+06 | 0 | 12 | 0 | 31 | 31 | 31 | 7 | 15 | 33 | 47 | 2 | No | 3 | 0.98 | 0.98 | 0 | No | No | No | Yes | No | Yes |
1.11545e+06 | 55 | 8 | 0 | 216 | 360 | 492 | 30 | 108 | 275 | 340 | 51 | No | 0 | 0 | 0 | 0 | No | No | Yes | Yes | No | Yes |
1.11562e+06 | -34 | 8 | 0 | 120 | 240 | 240 | 83 | 122 | 144 | 165 | 33 | No | 0 | 1 | 0.97 | 34 | No | No | No | Yes | No | Yes |
1.11645e+06 | 4 | 9 | 0 | 43 | 67 | 115 | 5 | 22 | 40 | 58 | 4 | No | 0 | 0.69 | 0.68 | 0 | No | No | No | Yes | No | Yes |
1.11683e+06 | 2 | 8 | 0 | 4 | 6 | 9 | 1 | 5 | 6 | 9 | 2 | No | 0 | 1 | 0.95 | 0 | No | No | No | Yes | No | Yes |
1.11687e+06 | -7 | 8 | 0 | 56 | 96 | 112 | 13 | 30 | 56 | 76 | 0 | No | 0 | 0.97 | 0.92 | 7 | No | No | No | Yes | No | Yes |
Load Data:
For the example we will load [product_backorders.csv] for binary classification. The goal here is to predict whether or not a product will be put on backorder status, given a number of product metrics such as current inventory, transit time, demand forecasts and prior sales. We load both way either from github or from the local.
# Load the Data:
data_file_path="https://github.com/h2oai/h2o-tutorials/raw/master/h2o-world-2017/automl/data/product_backorders.csv"
data = h2o.import_file(data_file_path)
Parse progress: |█████████████████████████████████████████████████████████| 100%
y = "went_on_backorder"
X = data.columns
X.remove(y)
X.remove("sku")
Run the AutoML:
Run AutoML, stopping after 10 models. The max_models argument specifies the number of individual (or “base”) models, and does not include the two ensemble models that are trained at the end.
# Run AutoML:
auto_ml = H2OAutoML(max_models = 10, seed = 1)
auto_ml.train(x = X, y = y, training_frame = data)
AutoML progress: |████████████████████████████████████████████████████████| 100%
Leader Board:
We will view the AutoML Leaderboard. Since we did not specify a leaderboard_frame in the H2OAutoML.train() method for scoring and ranking the models, the AutoML leaderboard uses cross-validation metrics to rank the models. Simply it is just a summatization of the models ranking from top to bottom. The leader model is stored at auto_ml.leader and the leaderboard is stored at auto_ml.leaderboard.
leader_board=auto_ml.leaderboard
Now we will view a snapshot of the top models.
leader_board.head()
model_id | auc | logloss | aucpr | mean_per_class_error | rmse | mse |
---|---|---|---|---|---|---|
StackedEnsemble_AllModels_AutoML_20200721_233627 | 0.950875 | 0.18191 | 0.749727 | 0.149404 | 0.227568 | 0.0517873 |
StackedEnsemble_BestOfFamily_AutoML_20200721_233627 | 0.950305 | 0.183105 | 0.746107 | 0.151635 | 0.228331 | 0.0521349 |
GBM_4_AutoML_20200721_233627 | 0.948839 | 0.173579 | 0.73916 | 0.157246 | 0.22659 | 0.051343 |
GBM_3_AutoML_20200721_233627 | 0.94683 | 0.177091 | 0.7331 | 0.147716 | 0.22862 | 0.0522671 |
XGBoost_3_AutoML_20200721_233627 | 0.945957 | 0.176662 | 0.736604 | 0.150975 | 0.228394 | 0.0521638 |
GBM_2_AutoML_20200721_233627 | 0.945111 | 0.179764 | 0.727168 | 0.166382 | 0.230232 | 0.0530067 |
GBM_5_AutoML_20200721_233627 | 0.944997 | 0.17789 | 0.731015 | 0.14231 | 0.229819 | 0.0528166 |
XGBoost_1_AutoML_20200721_233627 | 0.944094 | 0.181315 | 0.726938 | 0.170148 | 0.229817 | 0.0528157 |
XGBoost_2_AutoML_20200721_233627 | 0.943922 | 0.180467 | 0.72038 | 0.153593 | 0.229968 | 0.0528851 |
GBM_1_AutoML_20200721_233627 | 0.942459 | 0.183815 | 0.720288 | 0.15893 | 0.232004 | 0.0538257 |
If we need to view the entire leaderboard:
leader_board.head(rows=leader_board.nrows)
model_id | auc | logloss | aucpr | mean_per_class_error | rmse | mse |
---|---|---|---|---|---|---|
StackedEnsemble_AllModels_AutoML_20200721_233627 | 0.950875 | 0.18191 | 0.749727 | 0.149404 | 0.227568 | 0.0517873 |
StackedEnsemble_BestOfFamily_AutoML_20200721_233627 | 0.950305 | 0.183105 | 0.746107 | 0.151635 | 0.228331 | 0.0521349 |
GBM_4_AutoML_20200721_233627 | 0.948839 | 0.173579 | 0.73916 | 0.157246 | 0.22659 | 0.051343 |
GBM_3_AutoML_20200721_233627 | 0.94683 | 0.177091 | 0.7331 | 0.147716 | 0.22862 | 0.0522671 |
XGBoost_3_AutoML_20200721_233627 | 0.945957 | 0.176662 | 0.736604 | 0.150975 | 0.228394 | 0.0521638 |
GBM_2_AutoML_20200721_233627 | 0.945111 | 0.179764 | 0.727168 | 0.166382 | 0.230232 | 0.0530067 |
GBM_5_AutoML_20200721_233627 | 0.944997 | 0.17789 | 0.731015 | 0.14231 | 0.229819 | 0.0528166 |
XGBoost_1_AutoML_20200721_233627 | 0.944094 | 0.181315 | 0.726938 | 0.170148 | 0.229817 | 0.0528157 |
XGBoost_2_AutoML_20200721_233627 | 0.943922 | 0.180467 | 0.72038 | 0.153593 | 0.229968 | 0.0528851 |
GBM_1_AutoML_20200721_233627 | 0.942459 | 0.183815 | 0.720288 | 0.15893 | 0.232004 | 0.0538257 |
DRF_1_AutoML_20200721_233627 | 0.935803 | 0.222161 | 0.692536 | 0.171452 | 0.254289 | 0.064663 |
GLM_1_AutoML_20200721_233627 | 0.741995 | 0.338675 | 0.266396 | 0.29912 | 0.314387 | 0.0988395 |
Save the Leader Model:
h2o.save_model(auto_ml.leader, path = "./automl_classify_model_bin")
'/content/automl_classify_model_bin/StackedEnsemble_AllModels_AutoML_20200721_233627'
Download the model for future use:
auto_ml.leader.download_mojo(path = "./")
'/content/StackedEnsemble_AllModels_AutoML_20200721_233627.zip'
We can further use the h2o module to load the saved model and predict. We can always refer to the h2o.ai to get the insights of the module and perform our own requirement specific tasks.Visit the h2o.ai for more details.