GHGA is setting up a federated data network in Germany providing harmonized services to enable data sharing in collaboration with international partners like the European Genome Archive. The two de.NBI Cloud sites at Heidelberg and Tübingen are serving as initial GHGA nodes by providing the required efficient and secure infrastructure.
During the first year of GHGA, we were focused on planning, recruitment, and start-up activities. Various teams were formed spanning ELSI, outreach, data stewardship, software development, bioinformatics and metadata task areas. Being an infrastructure project, software development and implementation takes a central position and needs to quickly build the tools required by other task areas.
The GHGA software development team is building a suite of microservices that will be distributed between the central and local hubs. This distributed complexity, which enables a healthier growth cycle for GHGA, comes at the expense of an increased initial complexity. The de.NBI Cloud has been a great resource for GHGA software development team to test the concept in a sandbox project, and utilize state-of-the-art cloud and container tools like OpenStack and Kubernetes.
As the two major GHGA sites - Heidelberg and Tübingen - are running local de.NBI Cloud instances, they will become the early data hubs of the project. In addition, the direct connection to these clouds will enable GHGA to democratize big data by allowing researchers throughout Germany to work on huge data sets using state-of-the-art compute resources (CPU and GPU) without local infrastructure requirements.