Time series
Limited access API
This feature is only available to root
and super_admin_domain
profiles. Ask your administrator for proper user role profiling.
Time Series
One of the OpenGate IoT platform’s main features is data collection. While collecting data, OpenGate can store incoming values from devices in time series. OpenGate’s time series engine produces rows of collected data per device. The time series engine saves rows with columns and holds aggregated values per column grouped by configurable time buckets in each row.
Understanding time buckets
Administrators can use the time series REST API to set up time bucket periods using the JSON definition’s origin
and timeBucket
fields. The origin
field holds the starting date of the time series, while the timeBucket
sets the periods of the time buckets counting from the origin
date in seconds. The time series engine uses both fields to calculate time buckets.
Example: If the origin date is 2022-01-01T00:00:00.000Z
and the time bucket is 1 hour, then the first interval is from 2022-01-01T00:00:00.001Z
to 2022-01-01T01:00:00.000Z
, the second interval is from 2022-01-01T01:00:00.001Z
to 2022-01-01T02:00:00.000Z`, and so on.
2022-01-01T00:00:00.000Z -> 2022-01-01T01:00:00.000Z -> 2022-01-01T02:00:00.000Z ->...
Time series columns
Administrators can set time buckets and context columns when they create time series. The following sections explain how.
Time buckets columns
In the OpenGate time series engine context, time buckets are aggregation periods set using the timeBucket
field in seconds on the configuration of a time series. The time series engine updates each row using a different aggregation function per column when it receives new data for an existing row.
The Time series Functions Catalog will define available aggregation functions for time series definition.
Setting the aggregation period (timeBucket
) to 0
seconds means storing all received data with no aggregation functions applied. This setting has some points to take into account:
- The engine will retrieve all the historical data collected when asked using the search end-point.
- The engine can save several data streams in different columns of the same row when received with the same
at
value.
On the other hand, if timeBucket
is greater than 0, then bucketColumn
is required when administrators define a time series. Using the bucketColumn
, administrators can set the column’s name the engine will add to the search data responses to hold the end date of the selected bucket.
Context columns
In addition to columns with aggregated values per time bucket, administrators can add columns as context information to the time series. These columns don’t have aggregation functions and work right like data sets. OpenGate will add these context columns to the time series searching results.
Identifier column
Time series creation requires the configuration of the identifierColumn
field. This column represents the name the time series engine will assign to the provision.administration.identifier.\_current.value
path with filter=YES
and sort=true
when using the time series engine searching end-point.
In other words, the time series engine will always add a column to each row of the search results to identify devices and will use the identifierColumn
value as the column name.
Columns path field setting
Column configuration requires the setting of the path
field. The time series engine uses the path
field as a query string to extract data stream values and project them into columns.
Two or three parts can compose the path
string, depending on the data stream type:
- Data stream identifier: It is the
path
prefix and has to be a data stream identifier set in an OpenGate data model.
If the data stream id contains communicationModules[]
, the index of the communication module is required.
Example: device.communicationModules[0].subscription.mobile.imsi
- Data stream field: This is concatenated to the previous one using a dot
.
and can be one of the fields present in all data streams:_current.value
_current.date
_current.at
_current.source
_current.sourceInfo
- Value path: When the data stream type is a JSON object or array, the JSONPath must be present to query for a primary type field.
Retention policy
Using the retention
field, administrators can configure how long the data rows remain in the time series. The retention field unit is seconds. The retention time cannot be longer than the assigned organization policies.
Updating time series
You can edit several time series features, but this may affect the data stored in the time series being updated or its structure, requiring a process to adapt the information to what was demanded in the request. Until this process is finished, there is the possibility of the existence of dirty values.
Fields which can be modified are:
- Name
- Description
- IdentifierColumn
- BucketColumn
- BucketInitColumn
- Retention
- TimeBucket
- Context (you can add, remove and modify contexts)
- Columns (you can add, remove, and modify columns)
Column and context change considerations:
- The name of an element must be unique, including all columns, all contexts, identifiers, and bucket and bucket init columns, so if you want to add or rename one of them, you can not choose a name in use.
- Adding or removing columns or contexts with the
filter
property’s valueALWAYS
is impossible. Furthermore, it is impossible to modify this property’s value in one existing column or context toALWAYS
if it was a different one. If that is its current value, you can not set it to another. - You can not edit the path of a column or context. Instead, you can remove and create it again, getting the same result.
Modifying time bucket
Buckets in a time series have a fixed length, the value of the time bucket property in seconds. Changing this property will cause the new buckets to have a different size, and as a precaution, buckets in the future will be deleted if this action is taken. However, there are some unique situations that we will discuss here.
Set the new time bucket value having buckets already created, and they end after this instant
When the time bucket is modified, there is the possibility of buckets that begin before and end after the moment that update happened. In this case, OpenGate will change these buckets, setting their end instant to the moment the update was done; if it is needed, the next bucket will begin in that instant, too, and it will go until the next bucket after this one would start according to the new time series definition. We illustrate that in the following pictures.
Changing time bucket to a lower value
Changing time bucket to a more substantial value
Devices that don’t have any bucket before the time bucket is changed
In particular, adaptation is only needed if a device has collected data before the time bucket has been changed. If data is collected after the update, a new bucket following the new definition will be created.
Changing time bucket before first device data collection
Mixing previous scenarios
If we combine these previous scenarios, we can see that different device buckets may not fit for the same periods. It can also happen that a device collects data, but this data should belong to a bucket from the past; in this situation, if that bucket exists, the time series will take it as it is; in another case, it will create a new one but following the new definition. Again, the result can be buckets that won’t fit between themselves. Both situations are the price to pay for changing the timebucket.
Two devices with different buckets after changing time bucket
Two devices with varying buckets in the past after changing time bucket
Changing time bucket from zero to a higher value
As a particular case, we can set the time bucket to zero; starting from here, we have the possible situation where we edit a time series, changing its time bucket from zero to a higher value. From that moment, each data collection will cause a bucket to be created with the new length.
As described in previous sections, the data collected can be from the past, and the corresponding bucket must fit with the others already there. The difference is that every bucket with a length equal to zero, which would be overlapped by the new one, will be absorbed instead, and its information will be used in the execution of the aggregation functions of each column.
Changing time bucket from zero to higher value
Reading data
Developers can retrieve time series data by sending a POST
request with searching parameters in the HTTP body. The body is a JSON document with several fields.
- Filter and sort fields have the same format and can use
bucketColumn
,identifierColumn
,columns.name
,context.name
as keys. - Limit has the same format as data sets.
- Select has the same format that data sets, using the keys you can use in the filter option.
The default response returns a JSON object with a data
field as a two-dimensional array representing a data matrix, i.e., an array of arrays. Omitting the select clause, the order of elements in each array is:
- First, the column with the name set in the
bucketColumn
field with the end bucket date. - Second, the column with the name defined in the
identifierColumn
field represents theprovision.administration.identifier.\_current.value
path. - Next, the columns set in the
context
section - Finally, the columns are set in the
columns
section.
In addition, you can retrieve this information in a CSV format. For this option, the user should keep in mind the following considerations:
- By default, any text value will be enclosed in double quotes. Example: “text value”. It is possible to change the double quotes for any other character.
- Although special characters will be escaped using the backslash character (’\’), it is possible to define another for that goal.
- Another aspect that can be customized is the ’end of line character’, which by default is ‘\n.’
- An empty text will represent null values, and this feature can also be changed.
Our Time Series API is meticulously designed to cater to your advanced data analysis and manipulation needs. Recognizing the importance of performance and efficiency when you retrieve data in CSV format, we streamline the process by turning off sorting features. This intentional design choice significantly enhances the data retrieval speed, allowing you to access large time series swiftly. Typically, data downloaded in CSV format is used for subsequent in-depth analysis and manipulation. By optimizing the performance in this manner, we empower you to focus on what truly matters - diving into your data and extracting valuable insights without unnecessary delays. Embrace the advantage of efficiency with our tailored solution, ensuring that your data analysis journey is smooth and productive.
All these configurations can be made using the header parameter designed for that purpose. Still, the user is responsible for the resulting CSV response and whether it has a correct format.
Sorting limitations
By definition and for taking care of the performance, it is possible to set a maximum of two columns as sortable. When making a data search request, you can sort by bucketColumn
, identifierColumn
, and the sortable columns specified, but only one or two of them are allowed in each request.
For example, if there is a time series with the columns A
, B
, C
, and D
, and its sortable columns are A
and B
, valid sorting combinations would be (A, B)
, (B, A)
, (bucketColumn, A)
or (B, identifierColumn)
. Wrong combinations whould be (A, C)
, (bucketColumn, D)
or (identifierColumn, A, B)
.
Limit specification
Depending on response type, pagination specification and behavior can be different:
JSON
response: If limit fields are not specified, default values (specified in configuration) will be assigned. If defined, it will be validated with configuration values.CSV
response: If the limit field is not defined in the request body, complete CSV retrieval is wanted. If the Limit field is defined but incomplete, an error will be returned indicating that all fields must be determined correctly.
Querying for complete data can take too long. Use this option carefully.
Example reading collected data in CSV format
The following code snippet shows an example using the limit
sub-document in the JSON used to download the data.
{
"filter": {},
"limit": {
"size": 500,
"start": 1
}
}
In the following case, complete data will be retrieved.
{
"filter": {}
}
Finally, this is an example of an invalid query in CSV
:
{
"filter": {},
"limit": {}
}
Reading data as a dataset
In addition to reading time series data, it is also possible to run aggregation queries to get the results as a dataset with only a single output aggregated bucket per device.
To invoke this service, simply send a POST
with the aggregation query parameters as a JSON document in the HTTP body. In the request document you can define:
- A filter, just like the one that can be defined in the
reading data
. ThebucketColumn
,identifierColumn
,columns.name
andcontext.name
can be used as keys. - Limit has the same format as data sets.
- Select, an array of the output aggregated variables to be returned in the output. ou can specify for each column:
- Column is the identifier, which has the same format as in the filter option.
- Alias is the name of the output variable to use.
- Aggregation is the aggregation function to use.
Example request document
{
"filter": {
"gt": {
"bucket_id": "device_200"
}
},
"limit": {
"start": 1,
"size": 50
},
"select": {
"columns": [
{
"column": "temperature",
"alias": "first",
"aggregation": "FIRST"
},
{
"column": "temperature",
"alias": "last",
"aggregation": "LAST"
},
{
"column": "temperature",
"alias": "avg",
"aggregation": "AVG"
},
{
"column": "temperature",
"alias": "max",
"aggregation": "MAX"
},
{
"column": "temperature",
"alias": "min",
"aggregation": "MIN"
},
{
"column": "cpu",
"alias": "p_avg",
"aggregation": "AVG"
},
{
"column": "cpu",
"alias": "p_count",
"aggregation": "COUNT"
}
]
}
}
The default response returns a JSON object with a data
field as a two-dimensional array representing a data matrix, i.e., an array of arrays. In addition, you can retrieve this information in a CSV format, as in the reading data
operation.
Sorting restrictions and requirements of the identifierColumn
The output is always sorted in ascending order of the identifierColumn
.
As the output is aggregated by this identifierColumn
, the output will always include this column, including automatically in the first element of the column list if it wasn’t included in the request JSON document.
Limit specification
Depending on the response type, JSON or CSV, pagination specification and behaviour may differ:
JSON
response: If limit fields are not specified, default values (specified in the configuration) are assigned. If defined, they are validated against the configuration values.CSV
response: If the limit field is not defined in the request body, a full CSV retrieval is requested. If the limit field is defined but incomplete, an error is returned indicating that all fields must be correctly determined.
Querying for complete data can take too long. Use this option carefully.