Phased Genome Assembly

The comprehensive characterization of the genomic variants carried by an individual is a fundamental step towards personalized medicine. The starting point for this analysis is a de novo assembly that does not only sequence-resolve the human genome, but also provides the information which variants were inherited from the same parent (a process called “phasing”). We developed a computational pipeline, termed PGAS for “Phased Genome Assembly using Strand-seq”, that can phase and assemble a human genome in about one day in the de.NBI Cloud.

The field of computational genomics is advancing at a fast pace mostly driven by technological progress in the wet lab. This, in combination with ambitious research projects sequencing more and more individuals to uncover human genetic variation at population levels, demands for a flexible strategy to adapt bioinformatic pipelines to dynamic analysis requirements. The de.NBI Cloud offers an excellent environment for field-testing new analysis modules, experimenting with tool parameter settings and scrutinizing software deployment procedures. The customizable de.NBI Cloud cluster setups simplify efficient resource allocation and enable highly parallel workflows for data-intensive tasks such as phased genome assembly.

Our PGAS pipeline is implemented using established frameworks mostly from the Python ecosystem and can be deployed on a variety of computational infrastructures. The unrestricted access on de.NBI Cloud clusters is tremendously useful when we evaluate new tools for integration in our pipeline that have complex setup routines or require additional technologies such as containers. The outstanding support during all project phases and the straightforward way of requesting or extending a de.NBI Cloud application make the de.NBI Cloud an ideal platform for data-driven bioinformatic research.

Peter Ebert, Heinrich Heine University Düsseldorf, Institute for Medical Biometry and Bioinformatics

Peter Ebert
Peter Ebert from Institute for Medical Biometry and Bioinformatics, Heinrich Heine University Düsseldorf