We integrate genomic, transcriptomic, and metabolomic data of multiple plant species to decipher specialized biosynthesis networks. Plants produce a plethora of specialized metabolites. Understanding the complex connections of biosynthesis pathways requires the analysis of large data sets.
The first step in many projects is the generation of genome or transcriptome assemblies, which requires substantial computational resources. Especially large plant genome sequence assemblies based on long reads require a large amount of memory and a large number of CPUs. The preprocessing of nanopore sequencing reads needs powerful GPUs for the conversion of the electric signal into sequence information (basecalling). Genome sequences are subjected to a gene prediction process that is performed by advanced tools with large numbers of dependencies. Having a contained environment like a virtual machine is crucial to perform this step effectively. Static genome sequences are complemented by RNA-seq data sets that inform about the activity of genes. Processing large public RNA-seq data sets requires a fast internet connection and a large amount of disk space. The de.NBI cloud is well suited for this purpose. The option to have additional hard disks assigned to a virtual machine is especially helpful. The integration of different omics data and the incorporation of phylogenetic analyses requires customized solutions. Virtual machines running in the de.NBI are an excellent environment to develop novel workflows and tools, because everything is performed in a defined environment.
Given that students are working on many of our projects, it is crucial that starting and using a virtual machine in the de.NBI cloud is very convenient. Remote access enables flexibility through mobile working. It is also important that virtual machines can be replaced in case some installations are broken. Since everyone is working in their own contained environment, there are no conflicts caused by blocked compute nodes or long queues on a compute cluster.
The availability and ease of scalability motivate us to engage in numerous big-data plant science projects. Most of the group members are working in the de.NBI cloud on a daily basis. We could not conduct most of our research projects without the de.NBI cloud.