Guide

OpenGate Artificial Intelligence guide

OpenGate Inference Engine enables the execution of trained Artificial Intelligence models exported in PMML or ONNX formats and provisioned into OpenGate using its REST API (see OpenGate OpenAPI spec for Inference Engine bellow).

Besides that, the Inference Engine supports the provision and execution of pipelines consisting of the chained execution of Python code and AI models. Pipelines can run transformations and algorithms before and after obtaining predictions from AI models.

The following diagram shows the execution of a pipeline where:

  • the body of the REST API invocation feeds the transformer_1.py,
  • then the output of the transformer_1.py feeds the input of the transformer_2.py,
  • then the output of the transformer_2.py feeds the input of the ai_model.onnx,
  • then the output of the ai_model.onnx feeds the input of the transformer_3.py.
  • Finally, the pipeline sends the output of the transformer_3.py to OpenGate in the same way an IoT does.
stateDiagram-v2 [*] --> transfomer_1.py transfomer_1.py --> transfomer_2.py transfomer_2.py --> ai_model.onnx ai_model.onnx --> transfomer_3.py transfomer_3.py --> OpenGate OpenGate --> [*]

By summing up models, transformers, and pipelines, OpenGate solutions can leverage the power of Artificial Intelligence, making and storing predictions using collected data in a streaming way.

How to use the Inference Engine

These are the steps suggested to apply Artificial Intelligence to your OpenGate project:

  1. Define your data model using OpenGate data models.
  2. Put your devices to work and send data to OpenGate using any of the available device integration methods.
  3. Once you have enough data, download it using the OpenGate data points searchAPI.
  4. Train your AI model using the downloaded data.
  5. Export your AI model in any supported format: PMML or ONNX.
  6. Create the transformer python code files needed to feed the model or adapt the resulting predictions.
  7. Provide all the AI assets for your pipeline: AI model, transformers (or algorithms), and the pipeline itself to run the AI assets in the correct order (see spec below).
  8. Provision of the OpenGate automation rule to invoke the Inference Engine using the HTTP forwarder action.

After doing all the previous steps, you will be collecting predictions into a data stream already configured in one of your data models.

Underlying technologies

Model exchange formats

The Inference Engine supports two well-known model exchange formats: PMML and ONNX.

Transformers and algorithms

Software Kernel for transformers and custom algorithms brings Python 3.6 (stay tuned, we’ll update it soon) and the most common scientific python packages, including NumPy, Pandas, scikit-learn, SciPy, and more (see below).

  • absl-py-0.12.0
  • appdirs-1.4.4
  • bleach-1.5.0
  • dataclasses-0.8
  • fs-2.4.13
  • html5lib-0.9999999
  • importlib-metadata-4.0.1
  • jep-3.9.1
  • joblib-1.0.1
  • Markdown-3.3.4
  • numpy-1.19.5
  • pandas-0.25.3
  • pip-9.0.1
  • protobuf-3.17.0
  • python-dateutil-2.8.1
  • pytz-2021.1
  • scikit-learn-0.24.2
  • scipy-1.5.4
  • setuptools-28.8.0
  • six-1.16.0
  • sklearn-0.0
  • tensorflow-1.5.0
  • tensorflow-tensorboard-1.5.1
  • threadpoolctl-2.1.0
  • typing-extensions-3.10.0.0
  • Werkzeug-2.0.1
  • wheel-0.36.2
  • zipp-3.4.1

Transformers in a pipeline can use the previous packages to implement custom logic to adapt data formats and algorithms.

Developing transformers

The two main points to take into account developing transformers are:

  1. A variable named body will always be available in the execution context of the transformer. This variable holds the content of the request body sent to the transformer.
  2. The transformer executor will expect a X variable to hold the transformer’s output.

The following example shows an extremely simple transformer echoing the received body:

X = body

For example, let’s suppose we want to use a trained model to predict future temperatures in celsius. However, our devices are sending temperatures in Fahrenheit. Therefore, the transformer will convert Fahrenheit to Celsius degrees.

X = body
X["temperature"] = ( X["temperature"] - 32 ) / 1.8
Advanced example

The following source code shows a real example of a transformer modifying the input body into the required shape to feed an AI model in the next step of the pipeline.

The example uses pkl_encoder.pkl and pkl_scaler.pkl files. These files are serializations of a sklearn encoder and a sklearn scaler using the pickle module. The usage of helper files, like these pickle ones, requires their provision using the transformer REST API (see OpenAPI spec below).

from datetime import datetime
import pickle

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

X = body
d = X[0].split('/')
y_m_d = (d[2], d[0], d[1])
y_m_d = [int(i) for i in y_m_d]
# Change to 'day of year'
date = datetime(*y_m_d)
doy = date.timetuple().tm_yday
X[0] = doy
tmp = []
for i in X:  # Convert str to float
    try:
        tmp.append(float(i))
    except:  # For NaN
        tmp.append(i)

X = tmp
# Columns with no data in data set
indexes_2_remove = [6, 7, 15, 16]
for index in sorted(indexes_2_remove, reverse=True):
    del X[index]
# Wrap data into 2D array shape
X = [X]
# Encode categorical data
encoder = pickle.load(mem_fs.open('pkl_encoder.pkl', 'rb'))
X = encoder.transform(X)
# Scale data
scaler = pickle.load(mem_fs.open('pkl_scaler.pkl', 'rb'))
X = scaler.transform(X)
# Wrap the hole data into AI model input shape
X = {'X': [{'input_8': X.tolist()}]}

At the end of the previous example, the variable X holds the data precisely in shape required by the AI model in the next pipeline step. Then the inference engine executes the AI model and yields a value in the following shape:

{
  "result" : [
    {
      "dense_34" : [ [ 0.51234324 ] ]
    }
  ]
}

The following python code can act as the pipeline’s final step, unwrapping the result and getting a single value. With this value, the OpenGate inference engine pipeline feeds a data stream on a device with the output of the whole operation.

X = body["result"][0]["dense_34"][0][0]

In the previous example X could hold a value between 0 and 1, for example 0.5312.

Optimizing data download from OpenGate to model training

By security policy, OpenGate limits the size of the download CSV data. API requests can deal with this limit using the page and size params in the Accept header. This is an example of header params usage: Accept: text/plain;page=1;size=1000.

In the previous example, the page=1 parameter asks OpenGate for the first available page, and the size=1000 sets the number of rows per page to 1000. When looping through the pages (page=1, page=2, …, page=n), the API will eventually respond with a 204 code (no content), meaning no more pages available for the filter in the search.

The following diagram shows the sequence to loop through pages to download all available data points per filtered search:

sequenceDiagram Client->>+API: page=1 size=1000 API->>-Client: Status code 200 + data Client->>+API: page=2 size=1000 API->>-Client: Status code 200 + data Client-->>+API: looping pages API-->>-Client: ... Client->>+API: page=n size=1000 API->>-Client: Status code 204 no content (empty body)

Download strategy

Reducing the number of rows per request, in other words, the number of rows per page will ensure downloading all the available data matching with a given filter. A well-balanced configuration of page, size, and the number of requests, improves the data extraction experience. The recommendation from the OpenGate’s development teams is to use a page size lower or equal to 1000 rows (size <= 1000).

Example of data downloading in CSV format

The following bash script shows how to use the curl Linux command to ask OpenGate for data points to download in CSV using a filter written in the data.json file. Please, pay attention to the highlighted line setting up page and size parameter values:

#!/bin/bash

curl --header "Content-type: application/json" \
     --header "X-ApiKey: ec2bc474-ab48-49e1-99fb-70b80489a0e5" \
     --header "Accept: text/plain;page=2;size=500" \
     --data @data.json \
     --verbose \
     https://api.opengate.es/north/v80/search/datapoints

The previous example will download (if data is available) the 500 CSV rows of the page 2.

The JSON below shows a filter example to use in the file pointed by curl --data @data.json parameter.

The filter:

  • selects some specific devices by their identifier,
  • selects some data streams of them,
  • filter by a specific time window using gte and lt operators on the at field.
{
  "filter": {
    "and": [
      {
        "in": {
          "datapoints.entityIdentifier": [
            "device-id-1",
            "device-id-2"
          ]
        }
      },
      {
        "in": {
          "datapoints.datastreamId": [
            "temperature",
            "airPressure",
            "airHumidity"
          ]
        }
      },
      {
        "gte": {
          "datapoints._current.at": "2022-05-17T00:00:00.000Z"
        }
      },
      {
        "lt": {
          "datapoints._current.at": "2022-05-18T00:00:00.000Z"
        }
      }
    ]
  }
}