High-throughput sequencing has revolutionized the way we understand biology. With state-of-the-art technologies one can nowadays re-sequence a human genome for less than 1,000$. However, key technical problems are still present. First, with current technologies one cannot sequence a long DNA molecule as a whole, but rather only read it out as many relatively short chunks; as a result, data analysis becomes complicated. Second, the convoluted interaction between biology, experimental protocols and sequencing is sometimes very hard to disentangle, requiring precise experimental designs. Third, the sheer amount of data forces people to design and use novel tools whose algorithmic behaviour is sometimes hard to grasp. In summary, the bioinformatics interpretation of the data remains complicated, computational expensive, and full of technical pitfalls that are sometimes not apparent at first sight.
Paolo Ribeca (The Pirbright Institute, UK and Centro Nacional de Análisis Genómico, Spain)
English
This course will provide a basic introduction about how to correctly analyze high-throughput sequencing data. Several biological applications, experimental setups and sequencing protocols will be considered. A few hands-on sessions will be an essential part of the course.
Dr. Paolo Ribeca started working on the development of algorithms for the analysis of high-throughput sequencing data in 2008. He is the main architect of the GEM suite of bioinformatics programs (http://gemlibrary.sourceforge.net).