Omics Metadata Management Software

Core functionality resides in three tables, Specimen Information, Sample Processing, and Sequence MetaInformation, which have fields with embedded automation supporting efficient data metadata entry, storage and intuitive entity relationships facilitating data sharing and analysis. These tables are accessed via the MetaData portal.

Our software supports scalable, automated management of information associated with specimens (e.g., host tissue, microhabitat), sample preparation (e.g., methods for nucleic acid isolation and handling) and sequencing (e.g., sequencing platform and provider, read characteristics, library preparation method, data file names) in next-generation sequencing projects, where a host (animal or plant) is central to experimental design. Examples of such projects include animal and plant microbiome studies (e.g., http://hmpdacc.org/). This package was developed in Linux environments [CentOS release 5.8 (Final) and Red Hat Enterprise Linux Workstation release 6.5 (Santiago)] for the RAPid Threat ORganism Recognition (RAPTOR) Grand Challenge at Sandia National Laboratories. The OMMS can be integrated with state-of-the-art bioinformatics utilities, such as BLAST and programs in the Tuxedo Suite.

This application enables user-centric management of specimen, sample and sequence-production metainformation. The main functionality of our software resides in three tables: “Specimen Info,” “Sample Processing” and “Sequence Metainfo.” Each of these tables facilitates advanced experimental curation by generating unique identifiers and directory names for information entered by users, providing a web-based platform for entry and storage of project-specific information in consistent, persistent defined data structures. At its core, the OMMS:

  • Instantiates a Linux-based file system and SQL relational database structures;
  • Generates spreadsheets within and across each of the tables;
  • Supports uploading/downloading of metadata, input and output;
  • Integrates with state-of-the-art bioinformatics tools, allowing defined thresholds, and tailoring output formats of integrated utilities;
  • Stores results files (i.e., standard output and error from integrated programs);