• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Seminar of the laboratory "Clustering Billions of Reads for DNA Data Storage". Lecturer: K. Makarychev

Event ended
On October 03, 2017 K. Makarychev will give a lecture "Clustering of Billion Reads for DNA Data Storage".

Address: Faculty of computer science HSE, Kochnovsky proezd, 3.
Language: English
Time: 18:10-19:30
Hall: 205

If you have questions, please contact the manager of the laboratory Ekaterina Vavilova: evavilova@hse.ru.

Abstract

I will tell the audience how to quickly cluster billions of strings based on their similarity (edit distance). We will discuss what makes the problem hard and then explore known (theoretical/mathematical) techniques like Locality Sensitive Hashing (LSH), metric embeddings, and sketching that can be employed for clustering Big Data. Finally, I will show how we use these techniques along with some new ingredients to cluster billions of DNA strands.
I will also briefly mention how string clustering is used in the Microsoft DNA Storage project – the project that develops technology for storing data on synthesized DNA strands.
The talk is based on my joint work with a team of researchers from Microsoft Research and University of Washington. This paper will appear at NIPS 2017.