Implementing ML model in AutoML

4 minute read

Implement Machine Learning Model using AutoML:

Install h2o module to use the AutoML.

!pip install h2o

Collecting h2o
[?25l  Downloading https://files.pythonhosted.org/packages/f5/4a/e24acf8729af20384a1788e97b39b016be4bbf46a0bb475038f1fee97260/h2o-3.30.0.7.tar.gz (128.8MB)
[K     |████████████████████████████████| 128.8MB 84kB/s 
[?25hRequirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from h2o) (2.23.0)
Requirement already satisfied: tabulate in /usr/local/lib/python3.6/dist-packages (from h2o) (0.8.7)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from h2o) (0.16.0)
Collecting colorama>=0.3.8
  Downloading https://files.pythonhosted.org/packages/c9/dc/45cdef1b4d119eb96316b3117e6d5708a08029992b2fee2c143c7a0a5cc5/colorama-0.4.3-py2.py3-none-any.whl
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->h2o) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->h2o) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->h2o) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->h2o) (2020.6.20)
Building wheels for collected packages: h2o
  Building wheel for h2o (setup.py) ... [?25l[?25hdone
  Created wheel for h2o: filename=h2o-3.30.0.7-py2.py3-none-any.whl size=128865965 sha256=73528d7a6beb2b647c8ea501e4fec0ade3a5f9fda31be352aab1679483d59b99
  Stored in directory: /root/.cache/pip/wheels/a6/c2/6d/9612d426d2c947be23a8cd2d0156a9107927de630b8821ecea
Successfully built h2o
Installing collected packages: colorama, h2o
Successfully installed colorama-0.4.3 h2o-3.30.0.7

Import the h2o Python module and H2OAutoML class and initialize a local H2O cluster.

import h2o
from h2o.automl import H2OAutoML
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.7" 2020-04-14; OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04); OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)
  Starting server from /usr/local/lib/python3.6/dist-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpaw9v2fg3
  JVM stdout: /tmp/tmpaw9v2fg3/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmpaw9v2fg3/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.

H2O_cluster_uptime:	02 secs
H2O_cluster_timezone:	Etc/UTC
H2O_data_parsing_timezone:	UTC
H2O_cluster_version:	3.30.0.7
H2O_cluster_version_age:	6 hours and 5 minutes
H2O_cluster_name:	H2O_from_python_unknownUser_qrzuv8
H2O_cluster_total_nodes:	1
H2O_cluster_free_memory:	3.180 Gb
H2O_cluster_total_cores:	2
H2O_cluster_allowed_cores:	2
H2O_cluster_status:	accepting new members, healthy
H2O_connection_url:	http://127.0.0.1:54321
H2O_connection_proxy:	{"http": null, "https": null}
H2O_internal_security:	False
H2O_API_Extensions:	Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version:	3.6.9 final

data.head()

sku	national_inv	lead_time	in_transit_qty	forecast_3_month	forecast_6_month	forecast_9_month	sales_1_month	sales_3_month	sales_6_month	sales_9_month	min_bank	potential_issue	pieces_past_due	perf_6_month_avg	perf_12_month_avg	local_bo_qty	deck_risk	oe_constraint	ppap_risk	stop_auto_buy	rev_stop	went_on_backorder
1.11312e+06	0	8	1	6	6	6	0	4	9	12	0	No	1	0.9	0.89	0	No	No	No	Yes	No	Yes
1.11327e+06	0	8	0	2	3	4	1	2	3	3	0	No	0	0.96	0.97	0	No	No	No	Yes	No	Yes
1.11387e+06	20	2	0	45	99	153	16	42	80	111	10	No	0	0.81	0.88	0	No	No	No	Yes	No	Yes
1.11422e+06	0	8	0	9	14	21	5	17	36	43	0	No	0	0.96	0.98	0	No	No	No	Yes	No	Yes
1.11482e+06	0	12	0	31	31	31	7	15	33	47	2	No	3	0.98	0.98	0	No	No	No	Yes	No	Yes
1.11545e+06	55	8	0	216	360	492	30	108	275	340	51	No	0	0	0	0	No	No	Yes	Yes	No	Yes
1.11562e+06	-34	8	0	120	240	240	83	122	144	165	33	No	0	1	0.97	34	No	No	No	Yes	No	Yes
1.11645e+06	4	9	0	43	67	115	5	22	40	58	4	No	0	0.69	0.68	0	No	No	No	Yes	No	Yes
1.11683e+06	2	8	0	4	6	9	1	5	6	9	2	No	0	1	0.95	0	No	No	No	Yes	No	Yes
1.11687e+06	-7	8	0	56	96	112	13	30	56	76	0	No	0	0.97	0.92	7	No	No	No	Yes	No	Yes

Load Data:

For the example we will load [product_backorders.csv] for binary classification. The goal here is to predict whether or not a product will be put on backorder status, given a number of product metrics such as current inventory, transit time, demand forecasts and prior sales. We load both way either from github or from the local.

# Load the Data:
data_file_path="https://github.com/h2oai/h2o-tutorials/raw/master/h2o-world-2017/automl/data/product_backorders.csv"
data = h2o.import_file(data_file_path)

Parse progress: |█████████████████████████████████████████████████████████| 100%

y = "went_on_backorder"
X = data.columns
X.remove(y)
X.remove("sku")

Run the AutoML:

Run AutoML, stopping after 10 models. The max_models argument specifies the number of individual (or “base”) models, and does not include the two ensemble models that are trained at the end.

# Run AutoML:
auto_ml = H2OAutoML(max_models = 10, seed = 1)
auto_ml.train(x = X, y = y, training_frame = data)

AutoML progress: |████████████████████████████████████████████████████████| 100%

Leader Board:

We will view the AutoML Leaderboard. Since we did not specify a leaderboard_frame in the H2OAutoML.train() method for scoring and ranking the models, the AutoML leaderboard uses cross-validation metrics to rank the models. Simply it is just a summatization of the models ranking from top to bottom. The leader model is stored at auto_ml.leader and the leaderboard is stored at auto_ml.leaderboard.

leader_board=auto_ml.leaderboard

Now we will view a snapshot of the top models.

leader_board.head()

model_id	auc	logloss	aucpr	mean_per_class_error	rmse	mse
StackedEnsemble_AllModels_AutoML_20200721_233627	0.950875	0.18191	0.749727	0.149404	0.227568	0.0517873
StackedEnsemble_BestOfFamily_AutoML_20200721_233627	0.950305	0.183105	0.746107	0.151635	0.228331	0.0521349
GBM_4_AutoML_20200721_233627	0.948839	0.173579	0.73916	0.157246	0.22659	0.051343
GBM_3_AutoML_20200721_233627	0.94683	0.177091	0.7331	0.147716	0.22862	0.0522671
XGBoost_3_AutoML_20200721_233627	0.945957	0.176662	0.736604	0.150975	0.228394	0.0521638
GBM_2_AutoML_20200721_233627	0.945111	0.179764	0.727168	0.166382	0.230232	0.0530067
GBM_5_AutoML_20200721_233627	0.944997	0.17789	0.731015	0.14231	0.229819	0.0528166
XGBoost_1_AutoML_20200721_233627	0.944094	0.181315	0.726938	0.170148	0.229817	0.0528157
XGBoost_2_AutoML_20200721_233627	0.943922	0.180467	0.72038	0.153593	0.229968	0.0528851
GBM_1_AutoML_20200721_233627	0.942459	0.183815	0.720288	0.15893	0.232004	0.0538257

If we need to view the entire leaderboard:

leader_board.head(rows=leader_board.nrows)

model_id	auc	logloss	aucpr	mean_per_class_error	rmse	mse
StackedEnsemble_AllModels_AutoML_20200721_233627	0.950875	0.18191	0.749727	0.149404	0.227568	0.0517873
StackedEnsemble_BestOfFamily_AutoML_20200721_233627	0.950305	0.183105	0.746107	0.151635	0.228331	0.0521349
GBM_4_AutoML_20200721_233627	0.948839	0.173579	0.73916	0.157246	0.22659	0.051343
GBM_3_AutoML_20200721_233627	0.94683	0.177091	0.7331	0.147716	0.22862	0.0522671
XGBoost_3_AutoML_20200721_233627	0.945957	0.176662	0.736604	0.150975	0.228394	0.0521638
GBM_2_AutoML_20200721_233627	0.945111	0.179764	0.727168	0.166382	0.230232	0.0530067
GBM_5_AutoML_20200721_233627	0.944997	0.17789	0.731015	0.14231	0.229819	0.0528166
XGBoost_1_AutoML_20200721_233627	0.944094	0.181315	0.726938	0.170148	0.229817	0.0528157
XGBoost_2_AutoML_20200721_233627	0.943922	0.180467	0.72038	0.153593	0.229968	0.0528851
GBM_1_AutoML_20200721_233627	0.942459	0.183815	0.720288	0.15893	0.232004	0.0538257
DRF_1_AutoML_20200721_233627	0.935803	0.222161	0.692536	0.171452	0.254289	0.064663
GLM_1_AutoML_20200721_233627	0.741995	0.338675	0.266396	0.29912	0.314387	0.0988395

Save the Leader Model:

h2o.save_model(auto_ml.leader, path = "./automl_classify_model_bin")

'/content/automl_classify_model_bin/StackedEnsemble_AllModels_AutoML_20200721_233627'

Download the model for future use:

auto_ml.leader.download_mojo(path = "./")

'/content/StackedEnsemble_AllModels_AutoML_20200721_233627.zip'

We can further use the h2o module to load the saved model and predict. We can always refer to the h2o.ai to get the insights of the module and perform our own requirement specific tasks.Visit the h2o.ai for more details.

Share on

Twitter Facebook Google+ LinkedIn

Babrit Behera

Implementing ML model in AutoML

Implement Machine Learning Model using AutoML:

Load Data:

Run the AutoML:

Leader Board:

Share on

You May Also Enjoy

BERT-Fine-Tuning with PyTorch

Deploy application in GCP using Kubernetes

Simple Price Prediction Model

Understanding Data Scrapping