BioMGE——一个用于采集和分析生物医用材料和多组学数据的数据库

龚海燕; 张晓彤; 张司臣; 李铭鸿; 赵赫; 王婧宇; 王秀梅; 陈阳

doi:10.13374/j.issn2095-9389.2022.11.21.003

BioMGE——一个用于采集和分析生物医用材料和多组学数据的数据库

BioMGE: A database for biomedical material and multiomics data collection and analysis

摘要

摘要: 随着机器学习技术的发展，生物医用材料的科学研究也逐渐走向数据驱动，利用生物测序技术来测试生物医用材料的生物功能，需要对生物医用材料进行进一步优化. 因此，一个开放、共享的基础设施来存储来自不同研究领域的异构科学数据是多学科交叉联合分析的基石. 本文介绍了BioMGE，一个基于灵活的自定义平台NMDMS（国家材料数据管理与服务平台）实现的数据库建设案例，用于收集生物医用材料和多组学测序数据. NMDMS的动态容器框架允许用户定义个性化的数据提交模式，存储来自生物医用材料和多组学研究领域的数据. 自2019年以来，BioMGE已收集了1547100个生物医用材料和多组学数据集. BioMGE提供了数据导出接口，方便用户直接导出数据以进行数据分析. 以组学数据可视化为例，提供BioMGE-viewer模块以实现对生物染色质结构数据从一维到三维的数据可视化. 该数据库可为其他跨领域研究的数据共享提供了新的思路和平台.

Abstract: Biomedical materials scientific research is increasingly data-driven, thanks to advancements in machine learning technology. The application of biological sequencing technology for assessing the biological functions of biomedical materials demands further optimization. To facilitate comprehensive analysis, it is essential to establish an open, shared infrastructure for storing diverse scientific data from various research fields. This paper presents BioMGE, a case study in database construction, utilizing the flexible and user-defined NMDMS platform (National Materials Data Management and Service Platform). BioMGE is designed for the collection of biomedical materials and multiomics sequencing data. Leveraging NMDMS’s dynamic container framework, users can tailor data submission schemas to their preferences and store data from the domains of biomedical materials and multiomics research. To ensure data interoperability, the data schema creation module is combined with data standards. We also propose a standard specification for biomedical materials data. Employing the dynamic container framework and standard specifications, data submission schemas were established for biomedical material and multiomics data, covering aspects such as material names, experimental design, grouping information for experimental materials, and high-throughput omics sequencing. Since 2019, BioMGE has amassed 1547100 datasets of biomedical material and multiomics data based on these schemas. In order to enable users to analyze this data, BioMGE provides a data export interface. For instance, the BioMGE-viewer module offers one-dimensional, two-dimensional, and three-dimensional visualizations for omics data. The one-dimensional visualization displays gene information in tabular form. The two-dimensional visualization exhibits the topologically associating domains of chromatin using a heatmap. The three-dimensional visualization offers a three-dimensional representation of chromatin structure, aiding users in exploring the relationship between gene function and gene structure. What sets BioMGE apart is that it was constructed directly by researchers, not database designers. This means that researchers without programming expertise in various fields can design personalized data schemas that align with their research characteristics. This approach maximizes the interoperability and usability of NMDMS data. BioMGE has the potential to foster collaborative research across different domains and the joint analysis of biomedical materials and biological sequencing data. It offers fresh insights for the advancement of cell therapy and, concurrently, introduces a novel idea and platform for data sharing in various cross-field research endeavors.

HTML全文

参考文献(30)

施引文献

资源附件(0)