Tutorials

The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.

TUTORIALS LIST

Tutorial on BioC++ - solving daily bioinformatic tasks with C++ efficiently (BIOSTEC)
Instructor : René Rahn and Marcel Ehrhardt

Tutorial on
BioC++ - solving daily bioinformatic tasks with C++ efficiently

Instructors

	René Rahn Max Planck Institute for Molecular Genetics Germany

Brief Bio I am a senior developer and chief architect developing SeqAn since 2011 at the Freie Universität Berlin. I organized the annual SeqAn user and developer meetings. I furthermore held several lectures in the bachelor’s and master’s courses in bioinformatics at the Free University Berlin. Since 2016 I am working in the German ELIXIR node de.NBI/ELIXIR-DE where I mainly extend and maintain the infrastructure of the software library and the applications as well as work on the integration of our tools in workflow systems such as KNIME. I am a member of the special interest group for training and education of de.NBI, where among others we developed guidelines for creating trainings and training materials. In December 2019 I was certified as a Software Carpentry Instructor. The proposed tutorial is part of a new curriculum that we are designing to teach C++ for bioinformaticians. This curriculum is designed based on the recommendations and guidelines given by the Carpentries. Furthermore, we follow the guidelines provided by the SG20 of the ISO C++ Standard Committee for teaching C++.

	Marcel Ehrhardt Free University Berlin Germany

Brief Bio C++ senior developer and software architect of SeqAn since 2015 at the Freie Universität Berlin. PhD Student at the Freie Universität Berlin with special interest in topics like high performance computing, algorithms, (succinct) data structures, C++, software development and build infrastructures. Currently employed by a project funded by de.NBI/CIBI. Master's degree in computer science with focus on theoretical computer science at the Freie Universität Berlin. Over 4 years of experience as a teaching assistant.

Abstract

In this half-day tutorial we are going to teach how to use modern C++ and utilise modern C++ libraries to rapidly develop tools and scripts for operating on and manipulating large-scale sequencing data.

Motivation:

The high variability and heterogeneity often observed within various genomic data is challenging for many standard tools, for example for read alignment and variant calling. Often, these tools are wrapped in complicated pre- and postprocessing data curation steps in order to obtain results with higher quality. However, these additional steps incur a high maintenance and performance burden to the established work process and often do not scale with larger data sets. Seldomly, C++ is considered as the language of choice for these small processes, although it is the main language used in high-performance computing. We are going to show that implementing modern C++ can be as easy as using other modern high-level languages.

Keywords

BioC++, modern C++, bioinformatics, SeqAn, indexing, FileIO

Aims and Learning Objectives

Students will develop
• skills in developing an application using the C++ programming language
• skills in using modern C++ libraries to query large sequence databases (e.g. SeqAn, SDSL, etc.)
• knowledge and understanding of modern C++ features, such as ranges and concepts
• knowledge and understanding about modern and efficient data structures as well as algorithms crucial for large-scale genomic sequence analysis
• knowledge and understanding about how to develop and sustain high-quality software

Target Audience

This tutorial is mostly suited for computational biologist and bioinformaticians with research focus on sequence analysis (e.g., genomics, metagenomics, proteomics, read alignment, variant detection, etc.).

Prerequisite Knowledge of Audience

A fundamental knowledge about sequencing experiments and the involved data is required. We expect that attendees have an intermediate knowledge in programming with any high-level programming language, e.g. Python, Java or C++. Some basic C++-knowledge is helpful but not mandatory to successfully complete the course.

This tutorial is targeting beginners and intermediate C++ developers that want to learn more about modern C++ features like ranges and concepts.

Detailed Outline

Introduction to modern C++ [talk: 30 min]
Initial app and parsing sequencing data [hands-on: 60 min]
Filtering and data manipulation part I [hands-on: 60 min]
Filtering and data manipulation part II [hands-on: 90 min]
Wrap-up [talk: 30 min]

Secretariat Contacts
e-mail: biostec.secretariat@insticc.org