Vermilllion API Documentation (1.0.0)

Download OpenAPI specification:Download

E-mail: pct960@gmail.com URL: https://vermillion.datasetu.org License: datasetu Terms of Service

Vermillion is a high-performance, scalable and secure open-source data exchange platform developed using Vertx. It is a general-purpose resource-server that data providers and consumers can use to exchange time-series as well as static datasets. Vermillion exposes a simple search interface that can be used to query resources using various parameters like time, geo-coordinates, full-text or any combination thereof.

Authentication

datasetu-auth-server

The datasetu-auth-server is the authentication, authorisation and accounting (AAA) server of Datasetu. Data providers can set fine-grained access control policies to regulate access to their resources/datasets. Data consumers can request access tokens to get access to resources. For more information please refer to Datasetu Auth Server documentation. Access on a resource could be granted by providers using a read scope or a write scope. In the read scope, consumers will be able to invoke read-related APIs on the datasets. In the write scope, consumers will be able to invoke APIs that "write" to the resource. All APIs except publish need a read scope. The publish API needs a write scope for invocation.

Security Scheme Type	API Key
Query parameter name:	token

Consumer

A data consumer is any user or entity which is interested in a data resource that vermillion hosts (or acts as a intermediary for). Consumers discover resources on the datasetu catalogue and use the search interface to query the datasets

download

This endpoint is meant for downloading secure file datasets for which access has been obtained beforehand. If the fully-qualified resource ID is known then this endpoint can be invoked from programs or from user-agents like curl. Otherwise, invoking the endpoint with just an access token will bring up an HTML page containing a list of datasets the consumer had requested for. Furthermore, this API can be used in two modes. In the first mode, a specific resource ID or a specific set of resource IDs can be requested (a subset of the resources that the token has authorisation to). They will be then be made available on the consumer's directory which can be downloaded. In the second mode, the consumer can simply pass an access token, and all resources that the token has authorisation to will be made available in the consumer's directory. The pre-condition for the second mode is that the token presented must not have authorisation to heterogenous resources, i.e., a mixture of time-series datasets and files (or files residing on other resource servers). The download API merely performs the function of symlinking the requested resources to the consumer's directory. Once the symlinks are created, this API internally redirects to the /consumer/ API.

Authorizations:

datasetu-auth-server (read)

query Parameters

ACCESS_TOKEN required	string Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e153 A token granted by the datasetu auth server to access resources.
RESOURCE_ID	string Example: RESOURCE_ID=rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b /example.com/test-category/test-resource.public A fully qualified resource name obtained from the datasetu catalogue. One or more resource IDs can be specified in this API. In the latter case, the resource IDs need to be separated by a comma.

Responses

Response samples

200
400
403
404
500

Content type

text/plain

This is a sample text from a file.

latest

This API is for getting the latest datapoint of a resource. This is typically meant to be used on time-series datasets. However, it could be used to query the latest metadata of static files as well. It supports both open and secure datasets. An access token is required in the latter case.

Authorizations:

datasetu-auth-server (read)

query Parameters

RESOURCE_ID required	string Example: RESOURCE_ID=rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b /example.com/test-category/test-resource A fully qualified resource name obtained from the datasetu catalogue.
ACCESS_TOKEN	string Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179 An access token granted by the datasetu auth server

Responses

Response samples

200
400
403
404
500

Content type

application/json

{"data": {"data": {"Project": "Vermillion",
"ApiDocs": "Redoc",
"Definition": "OpenAPI"
}
},
"timestamp": "2021-03-05T10:18:00.952628Z",
"id": "rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource.public",
"category": "test-category"
}

scrolled-search

Scroll API gives all the datasets in chunks of specified size. The size determines the pagination of data points and it is to be defined in search API. Prior to this, search API should be hit to obtain scroll_id.

Authorizations:

datasetu-auth-server (read)

query Parameters

ACCESS_TOKEN

string

Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179

An access token granted by the datasetu auth server

Request Body schema: application/json

scroll_id required	string This is the scroll Id associated with data.
scroll_duration required	string The time duration specified/requested to scroll in and around the data

Responses

Request samples

Payload

Content type

application/json

{"scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFDFXVGpZbmdCLXVCbkdFcEk3TFF1AAAAAAAAAAIWZWNVMWdVVkVUNHlub1kzdldYR2d3Zw==",
"scroll_duration": "30m"
}

Response samples

200
400
500

Content type

application/json

{"hits": {"data": {"data": {"Project": "Vermillion",
"ApiDocs": "Redoc",
"Definition": "OpenAPI"
}
},
"timestamp": "2021-03-03T10:18:00.952628Z",
"id": "rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource.public",
"category": "test-category",
"co-ordinates": ["56.9",
"76.5"
],
"mime-type": "application/json"
},
"scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFDFXVGpZbmdCLXVCbkdFcEk3TFF1AAAAAAAAAAIWZWNVMWdVVkVUNHlub1kzdldYR2d3Zw=="
}

Search

This API provides a search interface for the data hosted on vermillion. Both public and secure datasets can be queried using this API, with an access token being required in the latter case. This interface provides options to query using time, geo-spatial co-ordinates, text or any combination thereof. Resource Id is a mandatory field across all search types. Along with the resource ID, at least one of the other three parameters is necessary for the search query.

Authorizations:

datasetu-auth-server (read)

query Parameters

ACCESS_TOKEN

string

Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179

A token granted by the datasetu auth server to access resources.

Request Body schema: application/json

The following lists the various filters which can be used in the search API. A filter can be clubbed with any other filter to perform a complex search.

Time-series Search

This can be used to query resources using a time-based filter.

Geo-spatial Search

This can be used to query resources using a geo-spatial filter, i.e., using geo co-ordinates.

Attribute Search

This can be used to query resources using a text-based or numeric filter.

Complex Search

When one or more of the above filters are used, all of them are applied while querying the DB.

One of

required	resourceId (string) or Array of resourceId-array (strings)
required	object (time) A jsonObject specifying the start and end times.
scroll_duration	string The time duration specified/requested for the ES context to be alive and subsequently to scroll in & around the data
size	integer The number of hits that consumer is interested in.

Responses

Request samples

Payload

Content type

application/json

Example

{"id": "rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource.public",
"time": {"start": "2021-02-3",
"end": "2021-03-3"
},
"scroll_duration": "60m",
"size": 3
}

Response samples

200
400
403
404
500

Content type

application/json

{"hits": {"data": {"data": {"Project": "Vermillion",
"ApiDocs": "Redoc",
"Definition": "OpenAPI"
}
},
"timestamp": "2021-03-03T10:18:00.952628Z",
"id": "rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource.public",
"category": "test-category",
"co-ordinates": ["56.9",
"76.5"
],
"mime-type": "application/json"
},
"scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFDFXVGpZbmdCLXVCbkdFcEk3TFF1AAAAAAAAAAIWZWNVMWdVVkVUNHlub1kzdldYR2d3Zw=="
}

consumer

This API is for consumers to get access to secure file datasets of providers. The pre-requisite to invoke this API is to invoke the /download API. The latter will create symlinks for the requested datasets in the consumer's directory. This API can be used on a browser in which case an HTML page containing the folders is returned. Alternatively, it could also be invoked from a user-agent such as curl if the fully qualified resource ID is known.

Authorizations:

datasetu-auth-server (read)

path Parameters

ACCESS_TOKEN

required

string

Example: auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179

A token granted by the datasetu auth server to access resources. In the above example, the endpoint the consumer needs to invoke would be /consumer/auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179. In this case, the consumer will NOT be able to browse parent directories of their folders unlike the case with the /provider/public API.

Responses

Response samples

200
404

Content type

text/plain

secure-resource-1  secure-resource-2

provider

This API allows a consumer to browse files/datasets that providers have made available publicly. This API can be used on a browser in which case an HTML page containing the folders is returned. Alternatively, it could also be invoked from a user-agent such as curl if the fully qualified resource ID is known.

Authorizations:

datasetu-auth-server (read)

path Parameters

RESOURCE_ID

string

Example: rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource

A fully qualified resource name obtained from the datasetu catalogue. In the above resource ID, the full path to access the file is /provider/public/rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b/example.com/test-category/test-resource. Furthermore, if a consumer wants to explore all publicly available datasets, they can simply invoke the /provider/public endpoint which will display publicly available datasets from all providers (of that Vermillion instance)

Responses

Response samples

200

Content type

text/plain

rbccps.org, iisc.com

Provider

A data provider is any user or entity which is responsible for a dataset that vermillion hosts. Providers can be data owners or have delegated access to act as custodians for resources. Providers upload details, access mechanisms, license and other metadata of resources onto the datasetu catalogue. Also, they manage access control rules for their resources on the datasetu auth server. Providers use the publish interface of Vermillion to upload datasets and dynamic metadata associated with it.

publish

This endpoint gives providers access to publish data into vermillion. Resource ID and access token are mandatory parameters. This API can be used to publish either time series data or static files. Depending on the mode, the request will have to be either application/json or multipart/form-data.

Authorizations:

datasetu-auth-server (write)

query Parameters

RESOURCE_ID required	string Example: RESOURCE_ID=rbccps.org/e096b3abef24b99383d9bd28e9b8c89cfd50be0b /example.com/test-category/test-resource A fully qualified resource name obtained from the datasetu catalogue.
ACCESS_TOKEN required	string Example: ACCESS_TOKEN=auth.datasetu.org/36a83204ea6ad6690a0eccda0f37e179 An access token granted by the datasetu auth server

Request Body schema:
application/json
application/json
multipart/form-data

As mentioned previously, this API can be used to publish time series data or static files. The request will vary depending on the mode used.

Publish-timeSeriesData

Time-series data in JSON, formatted as per the schema specified below.

Publish-staticData

Any file that the provider wishes to host on Vermillion

timestamp	string An optional parameter to indicate the relevant timestamp of the resource (created, modified etc.). When not specified, this field defaults to the time at which the data was published.
data required	object A mandatory field that contains the data of the resource. This is encased in the `data` field to allow for uniform searchability.
coordinates	Array of strings An array of co-ordinates specified as [longitude, latitude].

Responses

Request samples

Payload

Content type

{"timestamp": "2021-03-03T10:18:00.952628Z",
"data": {"data": {"PM10": {"value": "70",
"unit": "micrograms per cubic metre"
}
}
},
"coordinates": ["56.898989",
"67.4939"
]
}

Response samples

201
400
403
404
500

Content type

text/plain

Ok

Vermilllion API Documentation (1.0.0)

Authentication

datasetu-auth-server

Consumer

download

Authorizations:

query Parameters

Responses

Response samples

latest

Authorizations:

query Parameters

Responses

Response samples

scrolled-search

Authorizations:

query Parameters

Request Body schema: application/json

Responses

Request samples

Response samples

Search

Authorizations:

query Parameters

Request Body schema: application/json

Time-series Search

Geo-spatial Search

Attribute Search

Complex Search

Responses

Request samples

Response samples

consumer

Authorizations:

path Parameters

Responses

Response samples

provider

Authorizations:

path Parameters

Responses

Response samples

Provider

publish

Authorizations:

query Parameters

Request Body schema: application/jsonapplication/jsonmultipart/form-data

Publish-timeSeriesData

Publish-staticData

Responses

Request samples

Response samples

Request Body schema:
application/json
application/json
multipart/form-data