Abstract:
Biomedical materials scientific research is increasingly data-driven, thanks to advancements in machine learning technology. The application of biological sequencing technology for assessing the biological functions of biomedical materials demands further optimization. To facilitate comprehensive analysis, it is essential to establish an open, shared infrastructure for storing diverse scientific data from various research fields. This paper presents BioMGE, a case study in database construction, utilizing the flexible and user-defined NMDMS platform (National Materials Data Management and Service Platform). BioMGE is designed for the collection of biomedical materials and multiomics sequencing data. Leveraging NMDMS’s dynamic container framework, users can tailor data submission schemas to their preferences and store data from the domains of biomedical materials and multiomics research. To ensure data interoperability, the data schema creation module is combined with data standards. We also propose a standard specification for biomedical materials data. Employing the dynamic container framework and standard specifications, data submission schemas were established for biomedical material and multiomics data, covering aspects such as material names, experimental design, grouping information for experimental materials, and high-throughput omics sequencing. Since 2019, BioMGE has amassed 1547100 datasets of biomedical material and multiomics data based on these schemas. In order to enable users to analyze this data, BioMGE provides a data export interface. For instance, the BioMGE-viewer module offers one-dimensional, two-dimensional, and three-dimensional visualizations for omics data. The one-dimensional visualization displays gene information in tabular form. The two-dimensional visualization exhibits the topologically associating domains of chromatin using a heatmap. The three-dimensional visualization offers a three-dimensional representation of chromatin structure, aiding users in exploring the relationship between gene function and gene structure. What sets BioMGE apart is that it was constructed directly by researchers, not database designers. This means that researchers without programming expertise in various fields can design personalized data schemas that align with their research characteristics. This approach maximizes the interoperability and usability of NMDMS data. BioMGE has the potential to foster collaborative research across different domains and the joint analysis of biomedical materials and biological sequencing data. It offers fresh insights for the advancement of cell therapy and, concurrently, introduces a novel idea and platform for data sharing in various cross-field research endeavors.