|
|
(40 intermediate revisions by 3 users not shown) |
Line 1: |
Line 1: |
| == The BioCompute Standard == | | {{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}} |
| | __NOTOC__ |
| | <!-- BANNER ACROSS TOP OF PAGE --> |
| | <div id="ggw-topbanner" style="clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;"> |
| | <div style="margin:0.4em; text-align:center;"> |
| | <div style="font-size:160%; padding:.1em;">Welcome to BioCompute Objects Wiki,</div> |
| | <div style="font-size:100%;">The [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the BioCompute Objects project. This wiki system provides complementary information to the [https://www.biocomputeobject.org/ BioCompute portal] and is divided into the following main sections: General information for the [https://www.biocomputeobject.org/ BioCompute portal], [[User_guide|Quick Start and User Guide]], [[Faq|FAQ]], [[Sop|Curation SOP]], and [[About|About]] for the [https://www.biocomputeobject.org/ BioCompute portal].</div>You can also find the BioCompute White paper [[White paper|here]]. |
| | </div> |
| | </div> |
| | <div style="clear: both;"></div> |
|
| |
|
| Because of the many different ways to organize data, a major goal of the BioCompute project is to build and maintain a formal standard through recognized, accredited standards setting organizations like the Institute for Electrical and Electronics Engineers (IEEE) and the International Standards Organization (ISO). A formal, consensus-based standard builds predictability and even more stability into the way in which bioinformatic methods are communicated.
| | <div style="clear: both;"><br /> |
| | <div style="border-top: 1px solid #CCC; padding-top: 0.5em;"><div id="ggw_row3" style="display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;"> |
| | <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;"> |
|
| |
|
| The standard, officially known as 2791-2020, has two parts: the standards document and the schema, which is maintained in an open source repository:
| |
|
| |
|
| * The current version of the standard can be found [here](https://standards.ieee.org/standard/2791-2020.html)**.
| | <div id="ggw_row2" style="display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;"> |
| * The schema can be found [here](https://opensource.ieee.org/2791-object/ieee-2791-schema)**.
| | <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;"> |
| | <h3>[[User guide|BioCompute Object (BCO) User Guide]]</h3> |
| | <div style="border-top: 1px solid #CCC; padding-top: 0.5em;"> |
| | This document specifies the structure of BioCompute Objects. The specification is split into multiple parts linked to this top-level document and is maintained in a [https://github.com/biocompute-objects/BCO_Specification GitHub repository] where contributions are welcome. This document was created by the [[Main_Page#BioCompute_Object_Consortium_members_(BCOC)|BioCompute Object Consortium members (BCOC)]]. |
|
| |
|
| Since the base BioCompute schema is maintained as an open source repository, it can be cloned and integrated into an organization in unique ways, which allows organizations to build off of this schema to create dependent standards for specific applications. This is similar to the different versions of WiFi based on usage, such as the 802.11a standard for fast speed, but high cost and shorter range, or the 802.11b for slower top speed, but lower cost, etc. --- all of which are built on the 802.11 base standard. It can also be used to further extend the schema, such as for handling proprietary, internal content, while still being compatible with the base standard. The open source schema also enables individuals or organizations to suggest changes to be incorporated into future versions the standard.
| | It is offered as support for IEEE-2791-2020: [https://standards.ieee.org/ieee/2791/7337/ IEEE Standard for Bioinformatics Computations and Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication]. |
|
| |
|
| === Citation ===
| | Read more: [[Introduction|Introduction to BioCompute Objects]] |
| This standard was originaly prepared by [The BioCompute Object working group](/BCO_Spec_V1.2.md#biocompute-object-consortium-members-bcoc) during preparation for the [2017 HTS Computational Standards for Regulatory Sciences Workshop](https://hive.biochemistry.gwu.edu/htscsrs/workshop_2017).
| |
|
| |
|
| To reference the BCO standards, please use the following
| | </div> |
| citation inclusive of the DOI:
| | </div> |
|
| |
|
| Simonyan, V., Goecks, J., & Mazumder, R. (2017). ***Biocompute Objects — A Step towards Evaluation and Validation of Biomedical Scientific Computations.*** PDA Journal of Pharmaceutical Science and Technology, 71(2), 136–146. doi: [10.5731/pdajpst.2016.006734](http://doi.org/10.5731/pdajpst.2016.006734)
| | <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;"> |
| | <h3>[[Faq|Frequently Asked Questions]]</h3> |
| | <div style="border-top: 1px solid #CCC; padding-top: 0.5em;"> |
| | The FAQ section contains a list of questions asked by users regarding using the portal, pipeline steps, and extensions as well as questions related to the prerequisite, knowledgebase recommendation, and saving and publishing BCOs. |
|
| |
|
| === Support, Community and Contributing ===
| | Read more: [[Faq|Frequently Asked Questions]] |
|
| |
|
| To suggest changes to [this repository](https://github.com/biocompute-objects/BCO_Specification) we welcome contributions as a [pull request](https://github.com/biocompute-objects/BCO_Specification/pulls) or [issue](https://github.com/biocompute-objects/BCO_Specification/issues) submission.
| |
|
| |
|
| BCO_Specification is licensed under the [BSD 3-Clause "New" or "Revised" License](https://github.com/biocompute-objects/BCO_Specification/blob/main/LICENSE.md)
| | </div> |
| | </div> |
|
| |
|
| A permissive license similar to the BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the project or its contributors to promote derived products without written consent.
| | <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;"> |
| | <h3>[[Sop|BCO Curation SOP]]</h3> |
| | <div style="border-top: 1px solid #CCC; padding-top: 0.5em;"> |
| | Intended audience: authors and developers |
|
| |
|
| === Mailing List ===
| | This section is intended to provide guidance on BCO™ creation, versioning, certification and authentication. |
|
| |
|
| As a subscriber to the BCO mailing list, you can post to it by sending a message tobiocomputels@hermes.gwu.edu (using the email address that is subscribed). This list is semi-automated and will send your message for review.
| | Read more: [[Sop|BCO Curation SOP]] |
|
| |
|
| To subscribe or unsubscribe, please visit https://hermes.gwu.edu/cgi-bin/wa?A0=BIOCOMPUTELS and click `Subscribe` or `Unsubscribe` on the lower right. You can also unsubscribe from the list at any time by sending an email to listserv@hermes.gwu.edu, in which the body says: `unsubscribe biocomputes`
| |
|
| |
|
| This repository is in support of [2791-2020](https://standards.ieee.org/standard/2791-2020.html) - IEEE Approved Draft Standard for Bioinformatics Computations and Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication. Please also see our [OSF page](https://osf.io/h59uh/) or our [main page](https://biocomputeobject.org/)
| | </div> |
| | </div> |
| | </div> |
| | </div> |
|
| |
|
| == BioCompute Object (BCO) User Guide == | | <div id="ggw_row3" style="display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;"> |
| | <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;"> |
| | <h3>[[BioCompute Conference and Workshop|Workshop]]</h3> |
| | <div style="border-top: 1px solid #CCC; padding-top: 0.5em;"> |
| | We are hosting an in-person workshop on May 10, 2024. Learn more [[BioCompute Conference and Workshop|here]]. |
| | For all previous workshop materials, click [https://hive.biochemistry.gwu.edu/publications#Multimedia here]. |
| | </div> |
| | </div> |
|
| |
|
| This document was created by the BioCompute Object Consortium members (BCOC).
| | <div id="ggw_row3" style="display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;"> |
| | <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;"> |
| | <h3>[[About]]</h3> |
| | <div style="border-top: 1px solid #CCC; padding-top: 0.5em;"> |
| | A BioCompute Object (BCO) is an instance of the BioCompute standard and is a computational record of a bioinformatics pipeline. A BCO is not an analysis but is a record of which analyses were executed and in exactly which ways. In this way, a BCO acts as an interface for existing standards. A BCO contains all of the necessary information to repeat an entire pipeline from FASTQ to result and includes additional metadata to identify provenance and usage. |
|
| |
|
| It is offered as support for IEEE-2791-2020: IEEE Standard for Bioinformatics Computations and Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication.[https://standards.ieee.org/ieee/2791/7337/]
| | Read more: |
| | *[[About|What is BioCompute?]] |
| | *[[About|Wifi Analogy]] |
| | *[[About|BioCompute Description]] |
| | </div> |
| | </div> |
|
| |
|
| === Introduction === | | <div id="ggw_row3" style="display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;"> |
| | <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;"> |
| | <h3>[[Publications|Publication]]</h3> |
| | <div style="border-top: 1px solid #CCC; padding-top: 0.5em;"> |
| | '''For Citation Purpose:''' Simonyan, V., Goecks, J., & Mazumder, R. (2017). Biocompute Objects — A Step towards Evaluation and Validation of Biomedical Scientific Computations. PDA Journal of Pharmaceutical Science and Technology, 71(2), 136–146. doi: 10.5731/pdajpst.2016.006734 |
|
| |
|
| This document specifies the structure of BioCompute Objects. The specification is split into multiple parts linked to this top-level document and is maintained in a [https://github.com/biocompute-objects/BCO%20Specification GitHub repository] where contributions are welcome.
| | See also full list of [[publications]] about BioCompute Object. |
|
| |
|
| Read more: '''[https://docs.biocomputeobject.org/introduction/ Introduction to BioCompute Objects]
| | === Other links === |
| '''
| | # [[CDISC]] |
| === BioCompute Domains === | | # [[Galaxy]] |
| | | # [[RO-Crate]] |
| BCOs are represented in JSON (JavaScript Object Notation) formatted text, adhering to [https://json-schema.org/specification.html JSON schema draft-07]. The JSON format was chosen because it is both human and machine-readable/writable. For a detailed description of JSON see [http://www.json.org www.json.org].
| | # [[CWL|Common Workflow Language (CWL)]] |
| | | # [https://fairsharing.org/4293 FAIRsharing] |
| BioCompute data types are defined as aggregates of the critical fields organized into the following domains: the provenance domain, the usability domain, the extension domain, the description domain, the execution domain, the parametric domain, the input and output domains, and the error domain. At the time of creation with actual values compliant to the schema the BCO should be assigned a unique identifier, '''a [[https://docs.biocomputeobject.org/top-level/#202-biocompute-object-identifier-object_id|object_id]]'''. The object could then be assigned a unique digital '''[https://docs.biocomputeobject.org/top-level/ etag]'''.
| | </div> |
| | | </div> |
| Three of the domains in a BioCompute Object SHOULD become immutable upon assignment of the digital '''[https://docs.biocomputeobject.org/top-level/ etag]''':
| |
| | |
| '''1. the [https://docs.biocomputeobject.org/parametric-domain/ Parametric Domain]'''
| |
| | |
| '''2. the [https://docs.biocomputeobject.org/execution-domain/ Execution Domain] and'''
| |
| | |
| '''3. the [https://docs.biocomputeobject.org/io-domain/ I/O Domain]'''
| |
| | |
| *[https://docs.biocomputeobject.org/bco-domains/ '''BCO domains''']
| |
| | |
| === Appendices ===
| |
| ==== Appendix-I: BCO expanded view example ====
| |
| Complete example:
| |
| *[https://docs.biocomputeobject.org/examples/HCV1a.json HCV1a.json]
| |
| ==== 3.2 Appendix-II: External reference database list ====
| |
| CURIEs (short identifiers) like [taxonomy:31646] in BCOs can be expanded to complete identifiers.
| |
| | |
| Specifications:
| |
| | |
| '''*[https://docs.biocomputeobject.org/external-references/ External references]'''
| |
| | |
| ==== Title 21 CFR Part 11 ====
| |
| Code of Federal Regulations Title 21 Part 11: Electronic Records - Electronic Signatures
| |
| | |
| BioCompute project is being developed with Title 21 CFR Part 11 compliance in mind. The digital signatures incorporated into the format will provide the basis for the provenance of BioCompute Object integrity using NIST proposed encryption algorithms. Execution domain and parametric domain (that have a potential impact on a result of computation) and identity domain will be used to create hash values and digital signature encryption keys which later can be used for computer or human validation of transmitted objects.
| |
| | |
| Discussions are now taking place to consider the relevance of BioCompute Objects in relation to Title 21 CFR part 11. We encourage continuous input from BioCompute stakeholders on this subject now and while the concept is becoming more mature and more widely accepted by scientific and regulatory communities.
| |
| | |
| Relevant document link: [https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application Part 11: Electronic Records]
| |
| | |
| === Appendix IV - Compatibility ===
| |
| ==== ISA for the experimental metadata ====
| |
| ISA is a metadata framework to manage an increasingly diverse set of life science, environmental and biomedical experiments that employ one or a combination of technologies. Built around the Investigation (the project context), Study (a unit of research), and Assay (analytical measurements) concepts, ISA helps to provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. The ISA Model and Serialization Specifications define an Abstract Model of the metadata framework that has been implemented in two format specifications, ISA-Tab and ISA-JSON (http://isa-tools.org/format/specification), both of which have supporting tools and services associated with them, including by a programmable Python AP (http://isa-tools.org) and a varied user community and contributors (http://www.isacommons.org). ISA focuses on structuring experimental metadata; raw and derived data files, codes, workflows, etc are considered external files that are referenced. An example, along with its complementarity with other models and a computational workflow is illustrated in this paper, which shows how to explicitly declare elements of experimental design, variables, and findings: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612
| |
| | |
| ==== Appendix VI Acknowledgements ====
| |
| This document began development during the [https://hive.biochemistry.gwu.edu/htscsrs/workshop%202017 2017 HTS-CSRS workshop]. The discussion during the workshop led to the refinement and completion of this document. The workshop participants were a major part of the initial BCO community, and the comments and suggestions collected during the sessions were incorporated into this document. The people who participated in the 2017 workshop, and therefore made significant contributions are listed here: https://osf.io/h59uh/
| |
| | |
| == BioCompute Object Consortium members (BCOC) ==
| |
| | |
| FDA: Vahan Simonyan, Mark Walderhaug, Ruth Bandler, Eric Donaldson, Elaine Thompson, Alin Voskanian, Anton Golikov, Konstantinos Karagiannis, Elaine Johanson, Adrian Myers, Errol Strain, Khaled Bouri, Tong Weida, Wenming Xiao, Md Shamsuzzaman
| |
| | |
| GW: Raja Mazumder, Charles Hadley S. King IV, Amanda Bell, Jeet Vora, Krista M. Smith, Robel Kahsay
| |
| | |
| Documentation Community: Gil Alterovitz (Boston Children’s Hospital/Harvard Medical School, SMART/FHIR/HL7, GA4GH), Michael R. Crusoe (CWL), Marco Schito (C-Path), Konstantinos Krampis (CUNY), Alexander (Sasha) Wait Zaranek (Curoverse), John Quackenbush (DFCI/Harvard), Geet Duggal (DNAnexus), Singer Ma (DNAnexus), Yuching Lai (DDL), Warren Kibbe (Duke), Tony, Burdett (EBI), Helen Parkinson (EBI), Stuart Young (Engility Corp), Anupama Joshi (Epinomics), Vineeta Agarwala (Flatiron Health), James Hirmas (GenomeNext), David Steinberg (UCSC), Veronica Miller (HIV Forum), Dan Taylor (Internet 2), Paul Duncan (Merck), Jianchao Yao (Merck & Co., Inc., Boston, MA USA), Marilyn Matz (Paradigm4), Ben Busby (NCBI), Eugene Yaschenko (NCBI), Zhining Wang (NCI), Hsinyi (Steve) Tsang (NCI), Durga Addepalli (NCI/Attain), Heidi Sofia (NIH), Scott Jackson (NIST), Paul Walsh (NSilico Life Science), Toby Bloom (NYGC), Hiroki Morizono (CNMC), Jeremy Goecks (Oregon Health and Science University), Srikanth Gottipati (Otsuka-US), Alex Poliakov (Paradigm4), Keith Nangle (Pistoia Alliance), Jonas S Almeida (Stony Brook Univ, SUNY), Dennis A. Dean, II (Seven Bridges Genomics), Dustin Holloway (Seven Bridges Genomics), Nisha Agarwal (Solvuu), Stian Soiland-Reyes (UNIMAN), Carole Goble (UNIMAN), Susanna-Assunta Sansone (University of Oxford), Philippe Rocca-Serra (University of Oxford), Phil Bourne (Univ. of Virginia), Joseph Nooraga (Fred Hutchinson Cancer Research Center)
| |