Data Storage

Data Storage service using Ontology's DDXF

Data is a form of resource, be it simulated data or any other type of discrete data.

Data is also diverse in the sense that static data can be stores once and used perpetually, since it does not change, while dynamic data updates constantly and the records must be updated in real time in order to maintain it's integrity.

Considering data privacy, certain data can be made available to the consumers in the form of copies or duplicates. Some data can also be made available such that the data provider is not exposed, while the data can only be processed using an algorithm (provided by the data consumer) and the final result is shared with them.

All the roles and components described below have been defined in, and are derived from the Generic Resource Exchange Protocol (GREP). Please refer to the protocol specifications for more details.

Data Storage and Processing

Generally speaking, the RP (resource provider) needs to arrange the storage and access services for the data that is to be made available for purchase.

There are several different storage options. The RP may choose to setup and use their own storage service or they may use storage services such as cloud storage, or use decentralized storage platforms like ONTFS. If data is stored on a hosted platform, data security and integrity can be ensured using cryptography and other techniques, as necessary.

When an RC wishes to access the data, they need to send a request token to the data storage service. The request token can be a JWT token generated by the RC by signing a DToken.

Before granting access, the data service needs to check the validity of the DToken by querying the blockchain explorer, fetch the DToken's owner details (public key linked to the ONT ID), and verify the validity of the JWT token.

Once the verification is successful, both parties use the ONT ID to set up a secure data transfer channel, and transmit the data.

There are two methods for data access:

1.Fetching the data directly. This can be viewed as a transfer of ownership with respect to a copy of the data. Other permissions such as the authority to transfer, etc. can be fixed on the basis of transaction terms. The data can be decrypted and processed as necessary.

  • RC fetches the data from the storage service via the secure channel. For instance, real time data streaming is carried out via the secure channel to transfer data to the RC;

  • If the data is encrypted (stored encrypted on the hosting platform), the RC needs to send a request to the RP in order to obtain the decryption key. RP first needs to confirm the identity of the party sending the request, and then send the decryption key to the RC. Other cryptographic techniques can be used to enhance data security when transferring data. For example, proxy re-encryption can be used to transfer the encrypted data in order to reduce the number of times data is encrypted and decrypted

2. Request to analyze results. This can be viewed as a permission to access the data. If the RP does not wish to share the original, raw data with a consumer, they may use this method. The RC first provides the algorithm that needs to be run on the data. The algorithm is then executed with the data, and finally the result is shared with the RC via a secure channel. The data processing environment to execute the algorithm can be provided by the RP, or a third party platform may also be used.

To ensure the privacy and security of the data during the execution process, the RP can first ensure the environment is suitable for the same before proceeding with the real data and the algorithm.

If the RC isn't willing to share the algorithm with the RP, they may choose to carry out the data processing in a reliable third party environment. The third party carries out the execution, and then shares the result with RC and RP as per the contract terms.

Data Privileges and Management

The DToken protocol is used to generate tokens that signify data access and ownership privileges. This tokenization is carried out by the using the underlying blockchain technology of Ontology.

The data access permission token attributes of in DToken adheres to the system token privilege management specifications. The token attributes use the resource exchange and data processing and interaction processes that are part of the marketplace and assures blockchain technology's high security, tamper-proof, and traceable system characteristics.

Self-sovereign Data Usage

Data can be cleared, modelled, and analyzed to create new data. This new data also has an ONT ID. To enhance self-sovereign features for the data, the ONT ID document can be extended in the following manner:

type DataDDO struct{
  Name // Data name
  Fingerprint // Unique digital fingerprint of the data, required to generate an ONT ID for the data
  Description // Brief description
  Protocol    // Storage and access protocol, for e.g. HTTPS, IPFS, etc.
  Location    // Access path or address
  SourceData  // Original data list, ONT ID of original data
  Transformer // ONT ID of the party that processed the data
}

One method of extension could be where the above described structure can be included in the ONT ID attribute.

Based on the differences in data structure, the way a digital fingerprint is drawn also varies. Static data uses the hash value. Big data can be divided into groups and generate a merkle tree, then use the merkle root as the fingerprint. Dynamic data can use the unique identifier of the data source.

When details regarding the available data is published, the metadata should be provided. The metadata is open for the consumers to explore and make a choice. The metadata should follow a standard. The recommended standard can be found at schema.org.

Data Description Template

A unique identifier should be allocated for the data that is transmitted to the chain, which is the data ONT ID generated by the provider. The publisher also submits other details regarding the data. The publisher can be an individual or a group, determined based on the ONT ID structure. Besides, there is more data that can be included in the metadata structure, such as the version no., license no., etc.

The metadata template can be modified as necessary based on the market being targeted. The RP fills in the necessary metadata details to generate the metadata when publishing the data. The following is a sample metadata template that illustrates a valid structure:

{
    @context: ["https://ddxf.ont.io/schema/v2","http://schema.org"],
    @type: Dataset,
    identifier: did:ont:xxxx....,
    name: sample data,
    description: "Just a sample for structured data",
    keywords: "sample, structured",
    publisher: {
      @type: Person,
      identifier: did:ont:yyyy....,
      name: My Name,
      ...
    },
    datePublished: 2019-01-01T00:00:00Z
    owner: {...},
    version: 1,
    expires: 2020-02-01T00:00:00Z,
    license: "http://example.license.com/v1"
    ...
  }

There are some necessary fields in the above template that must be included in the structure:

  • @context: Specifies the context table of the schema. The first value must be https://ddxf.ont.io/schema/v2. The second value in the above sample is http://schema.org, which imports the schema.org methods;

  • @type: Specifies the data type of the respective data, must be one of the types that are defined in the previously defined context;

  • identifier: Unique identifier of the data resource, a valid ONT ID;

  • name: Data resource name;

  • description: Brief description of the contents of the data resource;

  • keywords: Keyword that makes the data resource searchable. Multiple key words can be specified by separating them using commas.

  • publisher: The data provider, target can be person or organization. The identifier field specifies their ONT ID, and there are other optional fields such as name, email, etc. that may be specified.

There are other optional fields that are part of the structure:

  • owner: The data owner, the target may be Person or Organization;

  • version: Data version no., maybe a number or a character string.

  • expires: Data expiry date;

  • license: License agreement for data usage;

Data Processing and Transactions

ONT ID and DDO help maintain a complete record of data processing and proof of consistency.

In a data transaction scenario, it must be ensured that a situation where RP's data server cannot be accessed does not occur. If RC can't access the data, the deposit tokens are refunded to their address. We can use digital signatures to determine whether the RC was able to access the data server by making sure that the RC provides a valid signature when they access the data server. (This signature can be verified using the RC's ONT ID)

Generally speaking, a few commonly used forms of proof are:

  • Digital Signature: The infallibility of digital signatures can be used to help the RP establish that they did provide data access.

  • Original Image of the Hash Function: Can be used to record any abnormalities or changes that may occur with respect to the data during transmission, such error logs, illustrations of the error process, etc.

Data Storage Service

Please use the links below to navigate to the relevant section.

Last updated