• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Семинар лаборатории теоретической информатики: "Clustering Billions of Reads for DNA Data Storage". Докладчик: К. Макарычев

Мероприятие завершено
На очередном семинаре лаборатории теоретической информатики во вторник  03 октября  состоится доклад 
Константина Макарычева
"Clustering Billions of Reads for DNA Data Storage".
Время проведения:18:10 - 19:30
Адрес мероприятия: Кочновский проезд, д. 3, ауд. 205
Заказ пропуска: evavilova@hse.ru

Abstract

I will tell the audience how to quickly cluster billions of strings based on their similarity (edit distance). We will discuss what makes the problem hard and then explore known (theoretical/mathematical) techniques like Locality Sensitive Hashing (LSH), metric embeddings, and sketching that can be employed for clustering Big Data. Finally, I will show how we use these techniques along with some new ingredients to cluster billions of DNA strands.
I will also briefly mention how string clustering is used in the Microsoft DNA Storage project – the project that develops technology for storing data on synthesized DNA strands.
The talk is based on my joint work with a team of researchers from Microsoft Research and University of Washington.This paper will appear at NIPS 2017.