Algorithms for Genomic Data Analysis

The course intends to introduce C.S. students (and related areas) to the storage, processing and analysis of next generation sequencing data and modern genomic technologies. It will cover the details of important algorithmic solutions used to solve particular problems posed by such data.  We will also discuss management/sharing challenges by these massive datasets. The course also intend to improve the understanding of  biological problem and to foster interdisciplinary thinking.

 

Pre-requisites

Previous attendance of  Bioinformatics [MSBME-118/12]is extremely desirable.

Topics

– Next-generation sequencing technologies /genome wide arrays

– Short read aligners (suffix trees, burrows wheeler transform)

– Genomic data formats, storage and indexing methods

– SNP and variant detection (Hidden Markov models, pileup methods)

– Detection of DNA-Protein interactions (peak calling)

– Transcript detection and quantification (de Bruijn graphs, digital differential expression tests) 

References

Richard Durbin, Sean R Eddy, Anders Krogh, Graeme Mitchison, Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press.
An Introduction to Bioinformatics Algorithms, Neil Jones e Pavel Pevzner, MIT Press, 2004