User guide: Difference between revisions

From BCOeditor Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
BioCompute Cheat Sheet can be found [https://github.com/biocompute-objects/BCO%20Specification/blob/main/static/docs/BCOCheatSheet.pdf here]
BioCompute Cheat Sheet can be found [[Cheatsheet|here]].
[[File:CheatSheet V07 jgk1 hk1.pdf|800px|center|thumb]]


This document was created by the BioCompute Object Consortium members (BCOC).
General user tutorial can be found here: [[Tutorials|User Tutorial]].


It is offered as support for [https://standards.ieee.org/ieee/2791/7337/ IEEE-2791-2020: IEEE Standard for Bioinformatics Computations and Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication].
API tutorial can be found here:[[BCO Portal Local Deployment]]


This document is offered as support for [https://standards.ieee.org/ieee/2791/7337/ IEEE-2791-2020: IEEE Standard for Bioinformatics Computations and Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication].
Go Back to [[Main Page|BioCompute Objects]].
== Table of contents ==
== Table of contents ==


Line 11: Line 13:


*[[Introduction|Introduction to BioCompute Objects]]
*[[Introduction|Introduction to BioCompute Objects]]
 
*Introduction to [[Bco-domains|BCO domains]]
*[[Bco-domains|BCO domains]]
**[[Top-level|Top-level fields]]
**[[Top-level|Top-level fields]]
**[[Provenance-domain|Provenance domain]]
**[[Provenance-domain|Provenance domain]]
Line 25: Line 26:
**[[Error-domain|Error domain]]
**[[Error-domain|Error domain]]
*[https://docs.biocomputeobject.org/examples/HCV1a.json BCO expanded view example HCV1a.json]
*[https://docs.biocomputeobject.org/examples/HCV1a.json BCO expanded view example HCV1a.json]
=== Introduction ===
This document specifies the structure of BioCompute Objects. The specification is split into multiple parts linked to this top-level document and is maintained in a [https://github.com/biocompute-objects/BCO%20Specification GitHub repository] where contributions are welcome.
Read more: [[Introduction|Introduction to BioCompute Objects]]
=== BioCompute Domains ===
=== BioCompute Domains ===


BCOs are represented in JSON (JavaScript Object Notation) formatted text, adhering to [https://json-schema.org/specification.html JSON schema draft-07]. The JSON format was chosen because it is both human and machine-readable/writable. For a detailed description of JSON see [http://www.json.org www.json.org].
[[Main Page|BCO]]s are represented in JSON (JavaScript Object Notation) formatted text, adhering to [https://json-schema.org/specification.html JSON schema draft-07]. The JSON format was chosen because it is both human and machine-readable/writable. For a detailed description of JSON see [http://www.json.org www.json.org].


BioCompute data types are defined as aggregates of the critical fields organized into the following domains: the provenance domain, the usability domain, the extension domain, the description domain, the execution domain, the parametric domain, and the input and output domains, and the error domain. At the time of creation with actual values compliant with the schema the BCO should be assigned a unique identifier, a [[Top-level|object_id]]. The object could then be assigned a unique digital [[Top-level#ETag_%E2%80%9Cetag%E2%80%9D|etag]].
BioCompute data types are defined as aggregates of the critical fields organized into the following domains: the provenance domain, the usability domain, the extension domain, the description domain, the execution domain, the parametric domain, and the input and output domains, and the error domain. At the time of creation with actual values compliant with the schema the BCO should be assigned a unique identifier, a [[Top-level|object_id]]. The object could then be assigned a unique digital [[Top-level#ETag_%E2%80%9Cetag%E2%80%9D|etag]].


Three of the domains in a BioCompute Object SHOULD become immutable upon assignment of the digital [[Top-level#ETag_%E2%80%9Cetag%E2%80%9D|etag]]:
The BioCompute Object becomes immutable upon assignment of the digital [[Top-level#ETag_%E2%80%9Cetag%E2%80%9D|etag]] and version.
 
1. the [[Parametric-domain|Parametric Domain]]
 
2. the [[Execution-domain|Execution Domain]] and


3. the [[Iodomain|I/O Domain]]
=== BCO expanded view example ===
 
*[[Bco-domains|BCO domains]]
 
=== Appendices ===
==== Appendix-I: BCO expanded view example ====
Complete example:
Complete example:
*[https://raw.githubusercontent.com/biocompute-objects/BCO_Specification/1.4.2/examples/HCV1a.json HCV1a.json]
*[https://raw.githubusercontent.com/biocompute-objects/BCO_Specification/1.4.2/examples/HCV1a.json HCV1a.json]


==== 3.2 Appendix-II: External reference database list ====
=== External reference database list ===
CURIEs (short identifiers) like [taxonomy:31646] in BCOs can be expanded to complete identifiers.
CURIEs (short identifiers) like [taxonomy:31646] in BCOs can be expanded to complete identifiers.


Line 59: Line 45:
*[https://docs.biocomputeobject.org/external-references/ External references]
*[https://docs.biocomputeobject.org/external-references/ External references]


==== Title 21 CFR Part 11 ====
=== Title 21 CFR Part 11 ===
Code of Federal Regulations Title 21 Part 11: Electronic Records - Electronic Signatures
Code of Federal Regulations Title 21 Part 11: Electronic Records - Electronic Signatures


Line 68: Line 54:
Relevant document link: [https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application Part 11: Electronic Records]
Relevant document link: [https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application Part 11: Electronic Records]


=== Appendix IV - Compatibility ===
=== Compatibility ===
 
==== ISA for the experimental metadata ====
==== ISA for the experimental metadata ====
ISA is a metadata framework to manage an increasingly diverse set of life science, environmental and biomedical experiments that employ one or a combination of technologies. Built around the Investigation (the project context), Study (a unit of research), and Assay (analytical measurements) concepts, ISA helps to provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. The ISA Model and Serialization Specifications define an Abstract Model of the metadata framework that has been implemented in two format specifications, ISA-Tab and ISA-JSON (http://isa-tools.org/format/specification), both of which have supporting tools and services associated with them, including by a programmable Python AP (http://isa-tools.org) and a varied user community and contributors (http://www.isacommons.org). ISA focuses on structuring experimental metadata; raw and derived data files, codes, workflows, etc are considered external files that are referenced. An example, along with its complementarity with other models and a computational workflow is illustrated in this paper, which shows how to explicitly declare elements of experimental design, variables, and findings: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612
ISA is a metadata framework to manage an increasingly diverse set of life science, environmental and biomedical experiments that employ one or a combination of technologies. Built around the Investigation (the project context), Study (a unit of research), and Assay (analytical measurements) concepts, ISA helps to provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. The ISA Model and Serialization Specifications define an Abstract Model of the metadata framework that has been implemented in two format specifications, ISA-Tab and ISA-JSON (http://isa-tools.org/format/specification), both of which have supporting tools and services associated with them, including by a programmable Python AP (http://isa-tools.org) and a varied user community and contributors (http://www.isacommons.org). ISA focuses on structuring experimental metadata; raw and derived data files, codes, workflows, etc are considered external files that are referenced. An example, along with its complementarity with other models and a computational workflow is illustrated in this paper, which shows how to explicitly declare elements of experimental design, variables, and findings: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612
==== Appendix VI Acknowledgements ====
This document began development during the [https://hive.biochemistry.gwu.edu/htscsrs/workshop%202017 2017 HTS-CSRS workshop]. The discussion during the workshop led to the refinement and completion of this document. The workshop participants were a major part of the initial BCO community, and the comments and suggestions collected during the sessions were incorporated into this document. The people who participated in the 2017 workshop, and therefore made significant contributions are listed here: https://osf.io/h59uh/
== BioCompute Object Consortium members (BCOC) ==
FDA: Vahan Simonyan, Mark Walderhaug, Ruth Bandler, Eric Donaldson, Elaine Thompson, Alin Voskanian, Anton Golikov, Konstantinos Karagiannis, Elaine Johanson, Adrian Myers, Errol Strain, Khaled Bouri, Tong Weida, Wenming Xiao, Md Shamsuzzaman
GW: Raja Mazumder, Charles Hadley S. King IV, Amanda Bell, Jeet Vora, Krista M. Smith, Robel Kahsay
Documentation Community: Gil Alterovitz (Boston Children’s Hospital/Harvard Medical School, SMART/FHIR/HL7, GA4GH), Michael R. Crusoe (CWL), Marco Schito (C-Path), Konstantinos Krampis (CUNY), Alexander (Sasha) Wait Zaranek (Curoverse), John Quackenbush (DFCI/Harvard), Geet Duggal (DNAnexus), Singer Ma (DNAnexus), Yuching Lai (DDL), Warren Kibbe (Duke), Tony, Burdett (EBI), Helen Parkinson (EBI), Stuart Young (Engility Corp), Anupama Joshi (Epinomics), Vineeta Agarwala (Flatiron Health), James Hirmas (GenomeNext), David Steinberg (UCSC), Veronica Miller (HIV Forum), Dan Taylor (Internet 2), Paul Duncan (Merck), Jianchao Yao (Merck & Co., Inc., Boston, MA USA), Marilyn Matz (Paradigm4), Ben Busby (NCBI), Eugene Yaschenko (NCBI), Zhining Wang (NCI), Hsinyi (Steve) Tsang (NCI), Durga Addepalli (NCI/Attain), Heidi Sofia (NIH), Scott Jackson (NIST), Paul Walsh (NSilico Life Science), Toby Bloom (NYGC), Hiroki Morizono (CNMC), Jeremy Goecks (Oregon Health and Science University), Srikanth Gottipati (Otsuka-US), Alex Poliakov (Paradigm4), Keith Nangle (Pistoia Alliance), Jonas S Almeida (Stony Brook Univ, SUNY), Dennis A. Dean, II (Seven Bridges Genomics), Dustin Holloway (Seven Bridges Genomics), Nisha Agarwal (Solvuu), Stian Soiland-Reyes (UNIMAN), Carole Goble (UNIMAN), Susanna-Assunta Sansone (University of Oxford), Philippe Rocca-Serra (University of Oxford), Phil Bourne (Univ. of Virginia), Joseph Nooraga (Fred Hutchinson Cancer Research Center)

Latest revision as of 14:19, 25 September 2024

BioCompute Cheat Sheet can be found here.

General user tutorial can be found here: User Tutorial.

API tutorial can be found here:BCO Portal Local Deployment

This document is offered as support for IEEE-2791-2020: IEEE Standard for Bioinformatics Computations and Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication.

Go Back to BioCompute Objects.

Table of contents

BioCompute Domains

BCOs are represented in JSON (JavaScript Object Notation) formatted text, adhering to JSON schema draft-07. The JSON format was chosen because it is both human and machine-readable/writable. For a detailed description of JSON see www.json.org.

BioCompute data types are defined as aggregates of the critical fields organized into the following domains: the provenance domain, the usability domain, the extension domain, the description domain, the execution domain, the parametric domain, and the input and output domains, and the error domain. At the time of creation with actual values compliant with the schema the BCO should be assigned a unique identifier, a object_id. The object could then be assigned a unique digital etag.

The BioCompute Object becomes immutable upon assignment of the digital etag and version.

BCO expanded view example

Complete example:

External reference database list

CURIEs (short identifiers) like [taxonomy:31646] in BCOs can be expanded to complete identifiers.

Specifications:

Title 21 CFR Part 11

Code of Federal Regulations Title 21 Part 11: Electronic Records - Electronic Signatures

BioCompute project is being developed with Title 21 CFR Part 11 compliance in mind. The digital signatures incorporated into the format will provide the basis for the provenance of BioCompute Object integrity using NIST proposed encryption algorithms. Execution domain and parametric domain (that have a potential impact on a result of computation) and identity domain will be used to create hash values and digital signature encryption keys which later can be used for computer or human validation of transmitted objects.

Discussions are now taking place to consider the relevance of BioCompute Objects in relation to Title 21 CFR part 11. We encourage continuous input from BioCompute stakeholders on this subject now and while the concept is becoming more mature and more widely accepted by scientific and regulatory communities.

Relevant document link: Part 11: Electronic Records

Compatibility

ISA for the experimental metadata

ISA is a metadata framework to manage an increasingly diverse set of life science, environmental and biomedical experiments that employ one or a combination of technologies. Built around the Investigation (the project context), Study (a unit of research), and Assay (analytical measurements) concepts, ISA helps to provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. The ISA Model and Serialization Specifications define an Abstract Model of the metadata framework that has been implemented in two format specifications, ISA-Tab and ISA-JSON (http://isa-tools.org/format/specification), both of which have supporting tools and services associated with them, including by a programmable Python AP (http://isa-tools.org) and a varied user community and contributors (http://www.isacommons.org). ISA focuses on structuring experimental metadata; raw and derived data files, codes, workflows, etc are considered external files that are referenced. An example, along with its complementarity with other models and a computational workflow is illustrated in this paper, which shows how to explicitly declare elements of experimental design, variables, and findings: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612