User guide: Difference between revisions
mNo edit summary |
Lorikrammer (talk | contribs) m (→Quick Links) |
||
| (7 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
'''Go Back to [[Main Page|BioCompute Objects]].''' | '''Go Back to [[Main Page|BioCompute Objects]].''' | ||
== Quick Links == | |||
*[[Cheatsheet|BioCompute Cheat Sheet]] | |||
*[[Introduction|Introduction to BioCompute Objects]] | *[[Introduction|Introduction to BioCompute Objects]] | ||
* | *[[Bco-domains|BCO Domains]] | ||
*[[Tutorials|General User Tutorial]] | |||
*[https://standards.ieee.org/ieee/2791/7337/ IEEE-2791-2020: IEEE Standard for Bioinformatics Computations and Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication] | |||
*[[BCO Portal Local Deployment|API Tutorial: BCO Portal Local Deployment]] | |||
*[[Quick start|Creating a BCO on Galaxy]] | |||
*[[Whirl: BCO R package converter|Whirl: An Open-Source R Package for Direct BioCompute Object Generation]] | |||
== BioCompute Domains == | |||
[[Main Page|BCO]]s are represented in JSON (JavaScript Object Notation) formatted text, adhering to [https://json-schema.org/specification.html JSON schema draft-07]. The JSON format was chosen because it is both human and machine-readable/writable. For a detailed description of JSON see [http://www.json.org www.json.org]. | [[Main Page|BCO]]s are represented in JSON (JavaScript Object Notation) formatted text, adhering to [https://json-schema.org/specification.html JSON schema draft-07]. The JSON format was chosen because it is both human and machine-readable/writable. For a detailed description of JSON see [http://www.json.org www.json.org]. | ||
BioCompute data types are defined as aggregates of the critical fields organized into the following domains: the provenance domain, the usability domain, the extension domain, the description domain, the execution domain, the parametric domain, and the input and output domains, and the error domain. At the time of creation with actual values compliant with the schema the BCO should be assigned a unique identifier, | BioCompute data types are defined as aggregates of the critical fields organized into the following domains: the provenance domain, the usability domain, the extension domain, the description domain, the execution domain, the parametric domain, and the input and output domains, and the error domain. At the time of creation with actual values compliant with the schema the BCO should be assigned a unique identifier, or [[Top-level|object_id]]. The object could then be assigned a unique digital [[Top-level#ETag_%E2%80%9Cetag%E2%80%9D|etag]]. | ||
The BioCompute Object becomes immutable upon assignment of the digital [[Top-level#ETag_%E2%80%9Cetag%E2%80%9D|etag]] and version. | The BioCompute Object becomes immutable upon assignment of the digital [[Top-level#ETag_%E2%80%9Cetag%E2%80%9D|etag]] and version. | ||
{| class="wikitable" | |||
|+8 Top Level Domains | |||
!Domain | |||
!Description | |||
!Required | |||
|- | |||
|'''Provenance Domain''' | |||
|'''Metadata describing the BCO''' | |||
|'''REQUIRED''' | |||
|- | |||
|'''Usability Domain''' | |||
|'''Free text field for researcher to explain the analysis and relevant details''' | |||
|'''REQUIRED''' | |||
|- | |||
|Extension Domain | |||
|User-defined fields | |||
|OPTIONAL | |||
|- | |||
|'''Description Domain''' | |||
|'''Steps of the analysis, external resources needed for the steps, and the relationship of I/O objects''' | |||
|'''REQUIRED''' | |||
|- | |||
|'''Execution Domain''' | |||
|'''Information about the environment in which the analysis was run''' | |||
|'''REQUIRED''' | |||
|- | |||
|Parametric Domain | |||
|Records any parameters that were changed from default values | |||
|OPTIONAL | |||
|- | |||
|'''Input and Output Domain''' | |||
|'''A list of global input and output files''' | |||
|'''REQUIRED''' | |||
|- | |||
|Error Domain | |||
|Used for describing errors. Can include the limits of detectability, false positives, false negatives, statistical confidence of outcomes, and description of errors | |||
|OPTIONAL | |||
|} | |||
== BCO Examples == | |||
*[[HCV1a ledipasvir resistance SNP detection]] | |||
*[ | *[[WGS Simulation of DUF1220 Regions]] | ||
== External Reference Database List == | |||
CURIEs (short identifiers) like [taxonomy:31646] in BCOs can be expanded to complete identifiers. | CURIEs (short identifiers) like [taxonomy:31646] in BCOs can be expanded to complete identifiers. | ||
Learn more at [https://docs.biocomputeobject.org/external-references/ external references]. | |||
== Title 21 CFR Part 11 == | |||
Code of Federal Regulations Title 21 Part 11: Electronic Records - Electronic Signatures | Code of Federal Regulations Title 21 Part 11: Electronic Records - Electronic Signatures | ||
| Line 56: | Line 74: | ||
Relevant document link: [https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application Part 11: Electronic Records] | Relevant document link: [https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application Part 11: Electronic Records] | ||
== Compatibility == | |||
==== ISA for the experimental metadata ==== | ==== ISA for the experimental metadata ==== | ||
ISA is a metadata framework to manage an increasingly diverse set of life science, environmental and biomedical experiments that employ one or a combination of technologies. Built around the Investigation (the project context), Study (a unit of research), and Assay (analytical measurements) concepts, ISA helps to provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. The ISA Model and Serialization Specifications define an Abstract Model of the metadata framework that has been implemented in two format specifications, ISA-Tab and ISA-JSON (http://isa-tools.org/format/specification), both of which have supporting tools and services associated with them, including by a programmable Python AP (http://isa-tools.org) and a varied user community and contributors (http://www.isacommons.org). ISA focuses on structuring experimental metadata; raw and derived data files, codes, workflows, etc are considered external files that are referenced. An example, along with its complementarity with other models and a computational workflow is illustrated in this paper, which shows how to explicitly declare elements of experimental design, variables, and findings: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612 | ISA is a metadata framework to manage an increasingly diverse set of life science, environmental and biomedical experiments that employ one or a combination of technologies. Built around the Investigation (the project context), Study (a unit of research), and Assay (analytical measurements) concepts, ISA helps to provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. The ISA Model and Serialization Specifications define an Abstract Model of the metadata framework that has been implemented in two format specifications, ISA-Tab and ISA-JSON (http://isa-tools.org/format/specification), both of which have supporting tools and services associated with them, including by a programmable Python AP (http://isa-tools.org) and a varied user community and contributors (http://www.isacommons.org). ISA focuses on structuring experimental metadata; raw and derived data files, codes, workflows, etc are considered external files that are referenced. An example, along with its complementarity with other models and a computational workflow is illustrated in this paper, which shows how to explicitly declare elements of experimental design, variables, and findings: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612 | ||
Latest revision as of 20:08, 12 March 2026
Go Back to BioCompute Objects.
Quick Links
- BioCompute Cheat Sheet
- Introduction to BioCompute Objects
- BCO Domains
- General User Tutorial
- IEEE-2791-2020: IEEE Standard for Bioinformatics Computations and Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication
- API Tutorial: BCO Portal Local Deployment
- Creating a BCO on Galaxy
- Whirl: An Open-Source R Package for Direct BioCompute Object Generation
BioCompute Domains
BCOs are represented in JSON (JavaScript Object Notation) formatted text, adhering to JSON schema draft-07. The JSON format was chosen because it is both human and machine-readable/writable. For a detailed description of JSON see www.json.org.
BioCompute data types are defined as aggregates of the critical fields organized into the following domains: the provenance domain, the usability domain, the extension domain, the description domain, the execution domain, the parametric domain, and the input and output domains, and the error domain. At the time of creation with actual values compliant with the schema the BCO should be assigned a unique identifier, or object_id. The object could then be assigned a unique digital etag.
The BioCompute Object becomes immutable upon assignment of the digital etag and version.
| Domain | Description | Required |
|---|---|---|
| Provenance Domain | Metadata describing the BCO | REQUIRED |
| Usability Domain | Free text field for researcher to explain the analysis and relevant details | REQUIRED |
| Extension Domain | User-defined fields | OPTIONAL |
| Description Domain | Steps of the analysis, external resources needed for the steps, and the relationship of I/O objects | REQUIRED |
| Execution Domain | Information about the environment in which the analysis was run | REQUIRED |
| Parametric Domain | Records any parameters that were changed from default values | OPTIONAL |
| Input and Output Domain | A list of global input and output files | REQUIRED |
| Error Domain | Used for describing errors. Can include the limits of detectability, false positives, false negatives, statistical confidence of outcomes, and description of errors | OPTIONAL |
BCO Examples
External Reference Database List
CURIEs (short identifiers) like [taxonomy:31646] in BCOs can be expanded to complete identifiers.
Learn more at external references.
Title 21 CFR Part 11
Code of Federal Regulations Title 21 Part 11: Electronic Records - Electronic Signatures
BioCompute project is being developed with Title 21 CFR Part 11 compliance in mind. The digital signatures incorporated into the format will provide the basis for the provenance of BioCompute Object integrity using NIST proposed encryption algorithms. Execution domain and parametric domain (that have a potential impact on a result of computation) and identity domain will be used to create hash values and digital signature encryption keys which later can be used for computer or human validation of transmitted objects.
Discussions are now taking place to consider the relevance of BioCompute Objects in relation to Title 21 CFR part 11. We encourage continuous input from BioCompute stakeholders on this subject now and while the concept is becoming more mature and more widely accepted by scientific and regulatory communities.
Relevant document link: Part 11: Electronic Records
Compatibility
ISA for the experimental metadata
ISA is a metadata framework to manage an increasingly diverse set of life science, environmental and biomedical experiments that employ one or a combination of technologies. Built around the Investigation (the project context), Study (a unit of research), and Assay (analytical measurements) concepts, ISA helps to provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. The ISA Model and Serialization Specifications define an Abstract Model of the metadata framework that has been implemented in two format specifications, ISA-Tab and ISA-JSON (http://isa-tools.org/format/specification), both of which have supporting tools and services associated with them, including by a programmable Python AP (http://isa-tools.org) and a varied user community and contributors (http://www.isacommons.org). ISA focuses on structuring experimental metadata; raw and derived data files, codes, workflows, etc are considered external files that are referenced. An example, along with its complementarity with other models and a computational workflow is illustrated in this paper, which shows how to explicitly declare elements of experimental design, variables, and findings: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612