Seminar of the laboratory "Clustering Billions of Reads for DNA Data Storage". Lecturer: K. Makarychev

Event ended

On October 03, 2017 K. Makarychev will give a lecture "Clustering of Billion Reads for DNA Data Storage".

Address: Faculty of computer science HSE, Kochnovsky proezd, 3.
Language: English
Time: 18:10-19:30
Hall: 205

If you have questions, please contact the manager of the laboratory Ekaterina Vavilova: evavilova@hse.ru.

Abstract

I will tell the audience how to quickly cluster billions of strings based on their similarity (edit distance). We will discuss what makes the problem hard and then explore known (theoretical/mathematical) techniques like Locality Sensitive Hashing (LSH), metric embeddings, and sketching that can be employed for clustering Big Data. Finally, I will show how we use these techniques along with some new ingredients to cluster billions of DNA strands.
I will also briefly mention how string clustering is used in the Microsoft DNA Storage project – the project that develops technology for storing data on synthesized DNA strands.
The talk is based on my joint work with a team of researchers from Microsoft Research and University of Washington. This paper will appear at NIPS 2017.

Date

3 October 18:10

Address

3 Kochnovskiy Proezd

About

Laboratory of Theoretical Computer Science