Time series

Limited access API

Limited access

This feature is only available to root and super_admin_domain profiles. Ask your administrator for proper user role profiling.

Time Series

One of the OpenGate IoT platform’s main features is data collection. While collecting data, OpenGate can store incoming values from devices in time series. OpenGate’s time series engine produces rows of collected data per device. The time series engine saves rows with columns and holds aggregated values per column grouped by configurable time buckets in each row.

Understanding time buckets

Administrators can use the time series REST API to set up time bucket periods using the JSON definition’s origin and timeBucket fields. The origin field holds the starting date of the time series, while the timeBucket sets the periods of the time buckets counting from the origin date in seconds. The time series engine uses both fields to calculate time buckets.

Example: If the origin date is 2022-01-01T00:00:00.000Z and the time bucket is 1 hour, then the first interval is from 2022-01-01T00:00:00.001Z to 2022-01-01T01:00:00.000Z, the second interval is from 2022-01-01T01:00:00.001Z to 2022-01-01T02:00:00.000Z`, and so on.

2022-01-01T00:00:00.000Z -> 2022-01-01T01:00:00.000Z -> 2022-01-01T02:00:00.000Z ->...

Time series columns

Administrators can set time buckets and context columns when they create time series. The following sections explain how.

Time buckets columns

In the OpenGate time series engine context, time buckets are aggregation periods set using the timeBucket field in seconds on the configuration of a time series. The time series engine updates each row using a different aggregation function per column when it receives new data for an existing row.

The Time series Functions Catalog will define available aggregation functions for time series definition.

Setting the aggregation period (timeBucket) to 0 seconds means storing all received data with no aggregation functions applied. This setting has some points to take into account:

  1. The engine will retrieve all the historical data collected when asked using the search end-point.
  2. The engine can save several data streams in different columns of the same row when received with the same at value.

On the other hand, if timeBucket is greater than 0, then bucketColumn is required when administrators define a time series. Using the bucketColumn, administrators can set the column’s name the engine will add to the search data responses to hold the end date of the selected bucket.

Context columns

In addition to columns with aggregated values per time bucket, administrators can add columns as context information to the time series. These columns don’t have aggregation functions and work right like data sets. OpenGate will add these context columns to the time series searching results.

Identifier column

Time series creation requires the configuration of the identifierColumn field. This column represents the name the time series engine will assign to the provision.administration.identifier.\_current.value path with filter=YES and sort=true when using the time series engine searching end-point.

In other words, the time series engine will always add a column to each row of the search results to identify devices and will use the identifierColumn value as the column name.

Columns path field setting

Column configuration requires the setting of the path field. The time series engine uses the path field as a query string to extract data stream values and project them into columns.

Two or three parts can compose the path string, depending on the data stream type:

  • Data stream identifier: It is the path prefix and has to be a data stream identifier set in an OpenGate data model.
Communication modules details

If the data stream id contains communicationModules[], the index of the communication module is required.

Example: device.communicationModules[0].subscription.mobile.imsi

  • Data stream field: This is concatenated to the previous one using a dot . and can be one of the fields present in all data streams:
    • _current.value
    • _current.date
    • _current.at
    • _current.source
    • _current.sourceInfo
  • Value path: When the data stream type is a JSON object or array, the JSONPath must be present to query for a primary type field.

Retention policy

Using the retention field, administrators can configure how long the data rows remain in the time series. The retention field unit is seconds. The retention time cannot be longer than the assigned organization policies.

Updating time series

You can edit several time series features, but this may affect the data stored in the time series being updated or its structure, requiring a process to adapt the information to what was demanded in the request. Until this process is finished, there is the possibility of the existence of dirty values.

Fields which can be modified are:

  • Name
  • Description
  • IdentifierColumn
  • BucketColumn
  • BucketInitColumn
  • Retention
  • TimeBucket
  • Context (you can add, remove and modify contexts)
  • Columns (you can add, remove, and modify columns)

Column and context change considerations:

  • The name of an element must be unique, including all columns, all contexts, identifiers, and bucket and bucket init columns, so if you want to add or rename one of them, you can not choose a name in use.
  • Adding or removing columns or contexts with the filter property’s value ALWAYS is impossible. Furthermore, it is impossible to modify this property’s value in one existing column or context to ALWAYS if it was a different one. If that is its current value, you can not set it to another.
  • You can not edit the path of a column or context. Instead, you can remove and create it again, getting the same result.

Modifying time bucket

Buckets in a time series have a fixed length, the value of the time bucket property in seconds. Changing this property will cause the new buckets to have a different size, and as a precaution, buckets in the future will be deleted if this action is taken. However, there are some unique situations that we will discuss here.

Set the new time bucket value having buckets already created, and they end after this instant

When the time bucket is modified, there is the possibility of buckets that begin before and end after the moment that update happened. In this case, OpenGate will change these buckets, setting their end instant to the moment the update was done; if it is needed, the next bucket will begin in that instant, too, and it will go until the next bucket after this one would start according to the new time series definition. We illustrate that in the following pictures.

Changing time bucket to a lower value Changing time bucket to a lower value Changing time bucket to a lower value

Changing time bucket to a more significant value Changing time bucket to a more significant value Changing time bucket to a more substantial value

Devices that don’t have any bucket before the time bucket is changed

In particular, adaptation is only needed if a device has collected data before the time bucket has been changed. If data is collected after the update, a new bucket following the new definition will be created.

Changing time bucket before first device data collection Changing time bucket before first device data collection Changing time bucket before first device data collection

Mixing previous scenarios

If we combine these previous scenarios, we can see that different device buckets may not fit for the same periods. It can also happen that a device collects data, but this data should belong to a bucket from the past; in this situation, if that bucket exists, the time series will take it as it is; in another case, it will create a new one but following the new definition. Again, the result can be buckets that won’t fit between themselves. Both situations are the price to pay for changing the timebucket.

Two devices with different buckets after changing time bucket Two devices with different buckets after changing time bucket Two devices with different buckets after changing time bucket

Two devices with different buckets in the past after changing time bucket Two devices with different buckets in the past after changing time bucket Two devices with varying buckets in the past after changing time bucket

Changing time bucket from zero to a higher value

As a particular case, we can set the time bucket to zero; starting from here, we have the possible situation where we edit a time series, changing its time bucket from zero to a higher value. From that moment, each data collection will cause a bucket to be created with the new length.

As described in previous sections, the data collected can be from the past, and the corresponding bucket must fit with the others already there. The difference is that every bucket with a length equal to zero, which would be overlapped by the new one, will be absorbed instead, and its information will be used in the execution of the aggregation functions of each column.

Changing time bucket from zero to higher value Changing time bucket from zero to higher value Changing time bucket from zero to higher value

Reading data

Developers can retrieve time series data by sending a POST request with searching parameters in the HTTP body. The body is a JSON document with several fields.

  • Filter and sort fields have the same format and can use bucketColumn, identifierColumn, columns.name, context.name as keys.
  • Limit has the same format as data sets.
  • Select has the same format that data sets, using the keys you can use in the filter option.

The default response returns a JSON object with a data field as a two-dimensional array representing a data matrix, i.e., an array of arrays. Omitting the select clause, the order of elements in each array is:

  • First, the column with the name set in the bucketColumn field with the end bucket date.
  • Second, the column with the name defined in the identifierColumn field represents the provision.administration.identifier.\_current.value path.
  • Next, the columns set in the context section
  • Finally, the columns are set in the columns section.

In addition, you can retrieve this information in a CSV format. For this option, the user should keep in mind the following considerations:

  • By default, any text value will be enclosed in double quotes. Example: “text value”. It is possible to change the double quotes for any other character.
  • Although special characters will be escaped using the backslash character (’\’), it is possible to define another for that goal.
  • Another aspect that can be customized is the ’end of line character’, which by default is ‘\n.’
  • An empty text will represent null values, and this feature can also be changed.
Warning

Our Time Series API is meticulously designed to cater to your advanced data analysis and manipulation needs. Recognizing the importance of performance and efficiency when you retrieve data in CSV format, we streamline the process by turning off sorting features. This intentional design choice significantly enhances the data retrieval speed, allowing you to access large time series swiftly. Typically, data downloaded in CSV format is used for subsequent in-depth analysis and manipulation. By optimizing the performance in this manner, we empower you to focus on what truly matters - diving into your data and extracting valuable insights without unnecessary delays. Embrace the advantage of efficiency with our tailored solution, ensuring that your data analysis journey is smooth and productive.

All these configurations can be made using the header parameter designed for that purpose. Still, the user is responsible for the resulting CSV response and whether it has a correct format.

Sorting limitations

By definition and for taking care of the performance, it is possible to set a maximum of two columns as sortable. When making a data search request, you can sort by bucketColumn, identifierColumn, and the sortable columns specified, but only one or two of them are allowed in each request.

For example, if there is a time series with the columns A, B, C, and D, and its sortable columns are A and B, valid sorting combinations would be (A, B), (B, A), (bucketColumn, A) or (B, identifierColumn). Wrong combinations whould be (A, C), (bucketColumn, D) or (identifierColumn, A, B).

Limit specification

Depending on response type, pagination specification and behavior can be different:

  • JSON response: If limit fields are not specified, default values (specified in configuration) will be assigned. If defined, it will be validated with configuration values.
  • CSV response: If the limit field is not defined in the request body, complete CSV retrieval is wanted. If the Limit field is defined but incomplete, an error will be returned indicating that all fields must be determined correctly.
Warning

Querying for complete data can take too long. Use this option carefully.

Example reading collected data in CSV format

The following code snippet shows an example using the limit sub-document in the JSON used to download the data.

{
  "filter": {},
  "limit": {
    "size": 500,
    "start": 1
  }
}

In the following case, complete data will be retrieved.

{
  "filter": {}
}

Finally, this is an example of an invalid query in CSV:

{
  "filter": {},
  "limit": {}
}

Reading data as a dataset

In addition to reading time series data, it is also possible to run aggregation queries to get the results as a dataset with only a single output aggregated bucket per device.

To invoke this service, simply send a POST with the aggregation query parameters as a JSON document in the HTTP body. In the request document you can define:

  • A filter, just like the one that can be defined in the reading data. The bucketColumn, identifierColumn, columns.name and context.name can be used as keys.
  • Limit has the same format as data sets.
  • Select, an array of the output aggregated variables to be returned in the output. ou can specify for each column:
    • Column is the identifier, which has the same format as in the filter option.
    • Alias is the name of the output variable to use.
    • Aggregation is the aggregation function to use.

Example request document

{
  "filter": {
    "gt": {
      "bucket_id": "device_200"
    }
  },
  "limit": {
    "start": 1,
    "size": 50
  },
  "select": {
    "columns": [
      {
        "column": "temperature",
        "alias": "first",
        "aggregation": "FIRST"
      },
      {
        "column": "temperature",
        "alias": "last",
        "aggregation": "LAST"
      },
      {
        "column": "temperature",
        "alias": "avg",
        "aggregation": "AVG"
      },
      {
        "column": "temperature",
        "alias": "max",
        "aggregation": "MAX"
      },
      {
        "column": "temperature",
        "alias": "min",
        "aggregation": "MIN"
      },
      {
        "column": "cpu",
        "alias": "p_avg",
        "aggregation": "AVG"
      },
      {
        "column": "cpu",
        "alias": "p_count",
        "aggregation": "COUNT"
      }
    ]
  }
}

The default response returns a JSON object with a data field as a two-dimensional array representing a data matrix, i.e., an array of arrays. In addition, you can retrieve this information in a CSV format, as in the reading data operation.

Sorting restrictions and requirements of the identifierColumn

The output is always sorted in ascending order of the identifierColumn.

As the output is aggregated by this identifierColumn, the output will always include this column, including automatically in the first element of the column list if it wasn’t included in the request JSON document.

Limit specification

Depending on the response type, JSON or CSV, pagination specification and behaviour may differ:

  • JSON response: If limit fields are not specified, default values (specified in the configuration) are assigned. If defined, they are validated against the configuration values.
  • CSV response: If the limit field is not defined in the request body, a full CSV retrieval is requested. If the limit field is defined but incomplete, an error is returned indicating that all fields must be correctly determined.
Warning

Querying for complete data can take too long. Use this option carefully.

API specification