Input and Output Domain

From BCOeditor Wiki
Revision as of 20:31, 10 February 2026 by Lorikrammer (talk | contribs) (Created page with "'''Go back to BCO Domains.''' == Input and Output Domain | "io_domain" == This section defines the io_domain part of the BCO. The IEEE io domain schema can be found [https://opensource.ieee.org/2791-object/ieee-2791-schema/-/blob/master/io_domain.json here]. This represents the list of global input and output files created by the computational workflow, excluding the intermediate files. These fields are pointers to objects that can resid...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Go back to BCO Domains.

Input and Output Domain | "io_domain"

This section defines the io_domain part of the BCO. The IEEE io domain schema can be found here.

This represents the list of global input and output files created by the computational workflow, excluding the intermediate files. These fields are pointers to objects that can reside in the system performing the computation or any other accessible system. Just like the fields of the parametric domain, these fields are expected to vary depending on the specific BCO implementation and can refer to named input-output arguments of underlying pipelines. Please refer to the documentation of individual scripts and specific BCO descriptions for further details.

Condensed example:

       "io_domain": {
               "input_subdomain": [
               ], 
               "output_subdomain": [
               ]
       }
Required Components of the IO Domain
Subdomain Object Field Status
input_subdomain REQUIRED
uri REQUIRED
filename NOT REQUIRED
uri REQUIRED
access_time NOT REQUIRED
sha1_checksum NOT REQUIRED
output_subdomain REQUIRED
uri REQUIRED
filename NOT REQUIRED
uri REQUIRED
access_time NOT REQUIRED
sha1_checksum NOT REQUIRED
mediatype REQUIRED

Input Subdomain | “input_subdomain”

This field records the references and input files for the entire pipeline. Each input file is listed as a uri object. This allows the author to be very specific about a particular type of input file if they so choose. For example, reference files have common names, and adding the common name here, in addition to the uri would make this more readable and understandable (eg, "HCV reference version..." or "human reference GRCH38"). For data integration workflows, the input files can be a table downloaded from a specific source which is then filtered for modification using rules described in the BCO.

The URI object and the URI field are both required.

       "input_subdomain": [
           {
               "uri": {
                   "filename": "Hepatitis C virus genotype 1", 
                   "uri": "http://www.ncbi.nlm.nih.gov/nuccore/22129792",
                   "access_time": "2017-01-24T09:40:17-0500"
               }
           }, 
           {
               "uri": {
                   "filename": "Hepatitis C virus type 1b complete genome", 
                   "uri": "http://www.ncbi.nlm.nih.gov/nuccore/5420376",
                   "access_time": "2017-01-24T09:40:17-0500"
               }
           }
       ]

URI Object and Field

The IO domain includes a required URI object. Within the URI object is a required URI field. The URI field must be included in both the input and output subdomains. Filename, access_time, and sha1_checksum are all optional. These optional values are up to the discretion of the author and the reviewer. The URI object is defined at the top-level schema because it is used in multiple domains and defines each of its fields as well. See the URI field definition in the IEEE top-level schema below.

"definitions": {
        "object_id": {
            "type": "string",
            "description": "A unique identifier that should be applied to each IEEE-2791 Object instance, generated and assigned by a IEEE-2791 database engine. IDs should never be reused"
        },
        "uri": {
            "type": "object",
            "description": "Any of the four Resource Identifers defined at https://tools.ietf.org/html/draft-handrews-json-schema-validation-01#section-7.3.5",
            "additionalProperties": false,
            "required": [
                "uri"
            ],
            "properties": {
                "filename": {
                    "type": "string"
                },
                "uri": {
                    "type": "string",
                    "format": "uri"
                },
                "access_time": {
                    "type": "string",
                    "description": "Time stamp of when the request for this data was submitted",
                    "format": "date-time"
                },
                "sha1_checksum": {
                    "type": "string",
                    "description": "output of hash function that produces a message digest",
                    "pattern": "[A-Za-z0-9]+"
                }
            }
        }, 

Output Subdomain | “output_subdomain”

This field records the outputs for the entire pipeline. Each output object is represented as a URI with the addition of a mediatype object. In the output domain, the URI and mediatype objects are both required. As mentioned in the URI Object and Field section, within the URI object, the URI field is required but filename, access_time, and sha1_checksum are not required.

"output_subdomain": [
        {
          "uri": {
            "filename": "Example Domain csv file",
            "uri": "http://example.com/data/514769/dnaAccessionBased.csv",
            "access_time": "2017-01-24T09:40:17-0500"
          },
          "mediatype": "text/csv"
        },
        {
          "uri": {
            "uri": "http://example.com/data/514801/SNPProfile*.csv",
            "access_time": "2017-01-24T09:40:17-0500"
          },
          "mediatype": "text/tsv"
        }
      ]

"Mediatype" Field Definition

The mediatype field is defined in the output domain. It is not required for the input domain. The official list of accepted media types can be found in the description within the definition. The definition can be found in the IEEE io schema below:

"output_subdomain": {
            "type": "array",
            "title": "output_subdomain",
            "description": "A record of the outputs for the entire pipeline.",
            "items": {
                "type": "object",
                "title": "The Items Schema",
                "required": [
                    "mediatype",
                    "uri"
                ],
                "properties": {
                    "mediatype": {
                        "type": "string",
                        "title": "mediatype",
                        "description": "https://www.iana.org/assignments/media-types/",
                        "default": "application/octet-stream",
                        "examples": [
                            "text/csv"
                        ],
                        "pattern": "^(.*)$"
                    },
                    "uri": {
                        "$ref": "2791object.json#/definitions/uri"
                    }
                }
            }
        }