Data sets
This feature is only available to root
and super_admin_domain
profiles. Ask your administrator for proper user role profiling.
Introduction
A data set can be defined as a collection of values gathered from remote devices within a specified organisational context and presented in tabular form. It is possible to configure OpenGate to populate each data set column by selecting the data streams that are available within the organisation in question. The permissible column values are limited to strings, numbers, and booleans. Accordingly, in the event that a data stream is defined by an object or array schema, it is necessary to define a path that will allow one to reach one of the aforementioned basic values. It should be noted that in the event that the devices in question are equipped with communication modules, it is necessary to specify a column for each data set pertaining to said communication modules.
This section will provide an overview of the processes involved in administering or reading data in a dataset.
About data definition
Identifier column
Once a time series has been defined, it is necessary to define the identifier column. This column represents the provision.administration.identifier.current.value
path with the filter set to YES and the sort set to TRUE.
Column path field
The column path field is comprised of three distinct sections. The final section is only mandatory if the selected data stream, as indicated by its datastreamId
field, exhibits a schema that is either an array or an object.
- Data stream identifier: It is necessary to select the desired
datastreamId
, which will also determine the index of the communication module to be viewed. In the event that the data stream ID containscommunicationModules[]
, the index of the communication module will be selected. The following example illustrates this process:device.communicationModules[0].subscription.mobile.imsi._current.value
. - Field of data streams: This can be one of the properties of any data stream. These options are:
_current.value
_current.date
_current.at
_current.feedId
_current.source
_current.sourceInfo
- Value path: In the event that the data stream scheme is not a basic type, it is necessary to select a path that leads to a field with a basic type.
Comprehensive API actions
Creating a dataset
A maximum of fourteen columns may be designated as filterable, while a maximum of three may be designated as sortable.
Updating a dataset
It is possible to edit the features of multiple datasets by issuing a PUT request to the relevant resource. In order to do so, it is necessary to specify the organisation and the dataset identifier in the URL.It should be noted that no more than fourteen columns can be designated as filterable, and no more than three as sortable.
This action may have an impact on the data stored within them or their structure. In such cases, a process is required to adapt the information to align with the specific requirements of the request, and it should be noted that the aforementioned process will not be completed until all dirty values have been resolved.
The following fields may be modified:
- Name
- Description
- IdentifierColumn
- Columns (you can add, remove, and modify columns)
In considering the aforementioned column, a number of factors come into play:
- Please note that the name of a column must be unique. Therefore, if you wish to add or rename a column, you will not be able to choose a name that is already in use.
- It is not possible to add or remove columns with the ‘filter’ property set to ‘ALWAYS’. Additionally, it is not possible to modify this property’s value in one existing column to ‘ALWAYS’ if it was a different one. If that is its current value, it cannot be set to another.
- It is not possible to edit the path of a column. Instead, you can remove and create the column, which will achieve the same result.
Searching data from a dataset
The process of searching for data sets differs from that of searching for other resources. In order to search for a data set, you must enter the identifier in the URL. The body of the request is similar to that of any other search.
- Filter and sort have the same format, using
identifierColumn | columns.name
like the key. - Limit has the same format, too.
- Group does not exist in dataset searching
- Select is different because you must only write
columns.name
in an array type to define the select option.
When the JSON format is selected for the response in a data set search, the results differ significantly from those of other searches. In this case, searching returns a JSON document with two fields.
- columns: This field is an array property. If field selection is requested, the content of this property will be the same. However, if the search does not use the select clause, this property will contain the identifier column in the first position, followed by the defined
columns.name
in datasets in the same order. - data: An array property is a data structure that represents a set of data rows, with each row containing a defined set of columns.
Furthermore, the information can be retrieved in a CSV format. When selecting this option, it is important to consider the following:
- As a default setting, any text value will be enclosed in double quotation marks. For example, “text value”. It is possible to change the double quotation marks for any other character.
- While the use of the backslash character (’') is the standard method for escaping special characters, there is an option to define an alternative.
- A further element that may be modified is the ’end of line character’, which, by default, is ‘\n.’
- In the absence of data, the field will be populated with the null value. It should be noted that this feature can also be modified.
The Data Sets API has been meticulously designed to cater to the specific requirements of advanced data analysis and manipulation. Aware of the significance of performance and efficiency when retrieving data in CSV format, we have implemented a process whereby the sorting features are disabled. This deliberate design decision markedly accelerates the retrieval of data, enabling the rapid access of extensive data sets. Typically, data downloaded in CSV format is employed for subsequent comprehensive analysis and manipulation. By optimising performance in this manner, we facilitate a concentration on the essential aspects of data analysis, namely the extraction of valuable insights without unnecessary delays. Our tailored solution offers the advantage of efficiency, ensuring a smooth and productive data analysis journey.
All of the aforementioned configurations can be created using the header parameter, which has been designed for this specific purpose. However, it is the responsibility of the user to ensure that the resulting CSV response is correct and in the correct format.
Sorting limitations
In accordance with the established parameters, a maximum of three columns may be designated as sortable for the purpose of optimizing performance. During the process of data search, the identifierColumn
and the specified sortable columns may be utilized for sorting, however, in each request, only one or two of the aforementioned columns are permitted.
To illustrate, consider a dataset comprising the columns A
, B
, C
, and D
, with the sortable columns being A
, B
, and C
. The valid sorting combinations are thus (A)
, (A, B)
, (B, A)
, (identifierColumn, A)
or (C, identifierColumn)
. Incorrect combinations would include (A, D)
, (identifierColumn, D)
, (identifierColumn, A, B)
and (A, B, C)
.
Limit specification
The manner in which a response is formulated, the specifications pertaining to pagination, and the subsequent behaviour can vary contingent on the aforementioned factors.
JSON
response: In the event that the limit field is not defined, the default values defined in the configuration will be applied. Should a value have been defined, it will be validated against the configuration values.CSV
response: In the event that the limit field is not defined within the request body, this signifies that the intention is to retrieve the entirety of the CSV data. Conversely, if the Limit field is defined but incomplete, an error will be returned, indicating that all fields must be correctly determined.
It is inadvisable to query for complete data, as this can be a time-consuming process. This option should therefore be used with caution.
Example reading collected data in CSV format
The following code example illustrates the utilisation of the limit
sub-document within the JSON structure employed for data retrieval.
{
"filter": {},
"limit": {
"size": 500,
"start": 1
}
}
In the following case, the complete data set will be retrieved.
{
"filter": {}
}
This is an example of an invalid query in CSV format.
{
"filter": {},
"limit": {}
}
Getting the organization’s datasets list
A maximum of fourteen columns may be designated as filterable, while a maximum of three may be designated as sortable.