Vermilllion API Documentation (1.0.0)

Download OpenAPI specification:Download

Vermillion is a high-performance, scalable and secure open-source data exchange platform developed using Vertx. It is a general-purpose resource-server that data providers and consumers can use to exchange time-series as well as static datasets. Vermillion exposes a simple search interface that can be used to query resources using various parameters like time, geo-coordinates, full-text or any combination thereof.

Authentication

datasetu-auth-server

The datasetu-auth-server is the authentication, authorisation and accounting (AAA) server of Datasetu. Data providers can set fine-grained access control policies to regulate access to their resources/datasets. Data consumers can request access tokens to get access to resources. For more information please refer to Datasetu Auth Server documentation. Access on a resource could be granted by providers using a read scope or a write scope. In the read scope, consumers will be able to invoke read-related APIs on the datasets. In the write scope, consumers will be able to invoke APIs that "write" to the resource. All APIs except publish need a read scope. The publish API needs a write scope for invocation.

Security Scheme Type API Key
Query parameter name: token

Consumer

A data consumer is any user or entity which is interested in a data resource that vermillion hosts (or acts as a intermediary for). Consumers discover resources on the datasetu catalogue and use the search interface to query the datasets

download

This endpoint is meant for downloading secure file datasets for which access has been obtained beforehand. If the fully-qualified resource ID is known then this endpoint can be invoked from programs or from user-agents like curl. Otherwise, invoking the endpoint with just an access token will bring up an HTML page containing a list of datasets the consumer had requested for. Furthermore, this API can be used in two modes. In the first mode, a specific resource ID or a specific set of resource IDs can be requested (a subset of the resources that the token has authorisation to). They will be then be made available on the consumer's directory which can be downloaded. In the second mode, the consumer can simply pass an access token, and all resources that the token has authorisation to will be made available in the consumer's directory. The pre-condition for the second mode is that the token presented must not have authorisation to heterogenous resources, i.e., a mixture of time-series datasets and files (or files residing on other resource servers). The download API merely performs the function of symlinking the requested resources to the consumer's directory. Once the symlinks are created, this API internally redirects to the /consumer/ API.

Authorizations:
query Parameters
ACCESS_TOKEN
required
string
Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e153

A token granted by the datasetu auth server to access resources.

RESOURCE_ID
string
Example: RESOURCE_ID=rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b /example.com/test-category/test-resource.public

A fully qualified resource name obtained from the datasetu catalogue. One or more resource IDs can be specified in this API. In the latter case, the resource IDs need to be separated by a comma.

Responses

Response samples

Content type
text/plain
This is a sample text from a file.

latest

This API is for getting the latest datapoint of a resource. This is typically meant to be used on time-series datasets. However, it could be used to query the latest metadata of static files as well. It supports both open and secure datasets. An access token is required in the latter case.

Authorizations:
query Parameters
RESOURCE_ID
required
string
Example: RESOURCE_ID=rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b /example.com/test-category/test-resource

A fully qualified resource name obtained from the datasetu catalogue.

ACCESS_TOKEN
string
Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179

An access token granted by the datasetu auth server

Responses

Response samples

Content type
application/json
{
  • "data": {
    },
  • "timestamp": "2021-03-05T10:18:00.952628Z",
  • "id": "rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource.public",
  • "category": "test-category"
}

Search

This API provides a search interface for the data hosted on vermillion. Both public and secure datasets can be queried using this API, with an access token being required in the latter case. This interface provides options to query using time, geo-spatial co-ordinates, text or any combination thereof. Resource Id is a mandatory field across all search types. Along with the resource ID, at least one of the other three parameters is necessary for the search query.

Authorizations:
query Parameters
ACCESS_TOKEN
string
Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179

A token granted by the datasetu auth server to access resources.

Request Body schema: application/json

The following lists the various filters which can be used in the search API. A filter can be clubbed with any other filter to perform a complex search.

This can be used to query resources using a time-based filter.

This can be used to query resources using a geo-spatial filter, i.e., using geo co-ordinates.

This can be used to query resources using a text-based or numeric filter.

When one or more of the above filters are used, all of them are applied while querying the DB.

One of
required
resourceId (string) or Array of resourceId-array (strings)
required
object (time)

A jsonObject specifying the start and end times.

scroll_duration
string

The time duration specified/requested for the ES context to be alive and subsequently to scroll in & around the data

size
integer

The number of hits that consumer is interested in.

Responses

Request samples

Content type
application/json
Example
{
  • "id": "rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource.public",
  • "time": {
    },
  • "scroll_duration": "60m",
  • "size": 3
}

Response samples

Content type
application/json
{
  • "hits": {
    },
  • "scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFDFXVGpZbmdCLXVCbkdFcEk3TFF1AAAAAAAAAAIWZWNVMWdVVkVUNHlub1kzdldYR2d3Zw=="
}

consumer

This API is for consumers to get access to secure file datasets of providers. The pre-requisite to invoke this API is to invoke the /download API. The latter will create symlinks for the requested datasets in the consumer's directory. This API can be used on a browser in which case an HTML page containing the folders is returned. Alternatively, it could also be invoked from a user-agent such as curl if the fully qualified resource ID is known.

Authorizations:
path Parameters
ACCESS_TOKEN
required
string
Example: auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179

A token granted by the datasetu auth server to access resources. In the above example, the endpoint the consumer needs to invoke would be /consumer/auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179. In this case, the consumer will NOT be able to browse parent directories of their folders unlike the case with the /provider/public API.

Responses

Response samples

Content type
text/plain
secure-resource-1  secure-resource-2

provider

This API allows a consumer to browse files/datasets that providers have made available publicly. This API can be used on a browser in which case an HTML page containing the folders is returned. Alternatively, it could also be invoked from a user-agent such as curl if the fully qualified resource ID is known.

Authorizations:
path Parameters
RESOURCE_ID
string
Example: rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource

A fully qualified resource name obtained from the datasetu catalogue. In the above resource ID, the full path to access the file is /provider/public/rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource. Furthermore, if a consumer wants to explore all publicly available datasets, they can simply invoke the /provider/public endpoint which will display publicly available datasets from all providers (of that Vermillion instance)

Responses

Response samples

Content type
text/plain
rbccps.org, iisc.com

Provider

A data provider is any user or entity which is responsible for a dataset that vermillion hosts. Providers can be data owners or have delegated access to act as custodians for resources. Providers upload details, access mechanisms, license and other metadata of resources onto the datasetu catalogue. Also, they manage access control rules for their resources on the datasetu auth server. Providers use the publish interface of Vermillion to upload datasets and dynamic metadata associated with it.

publish

This endpoint gives providers access to publish data into vermillion. Resource ID and access token are mandatory parameters. This API can be used to publish either time series data or static files. Depending on the mode, the request will have to be either application/json or multipart/form-data.

Authorizations:
query Parameters
RESOURCE_ID
required
string
Example: RESOURCE_ID=rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b /example.com/test-category/test-resource

A fully qualified resource name obtained from the datasetu catalogue.

ACCESS_TOKEN
required
string
Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179

An access token granted by the datasetu auth server

Request Body schema:

As mentioned previously, this API can be used to publish time series data or static files. The request will vary depending on the mode used.

Publish-timeSeriesData

Time-series data in JSON, formatted as per the schema specified below.

Publish-staticData

Any file that the provider wishes to host on Vermillion

timestamp
string

An optional parameter to indicate the relevant timestamp of the resource (created, modified etc.). When not specified, this field defaults to the time at which the data was published.

data
required
object

A mandatory field that contains the data of the resource. This is encased in the data field to allow for uniform searchability.

coordinates
Array of strings

An array of co-ordinates specified as [longitude, latitude].

Responses

Request samples

Content type
{
  • "timestamp": "2021-03-03T10:18:00.952628Z",
  • "data": {
    },
  • "coordinates": [
    ]
}

Response samples

Content type
text/plain
Ok