Hello to all,
as you might now for the last couple of years I have been preparing my PhD and the time has finally come for it to end. I will soon be defending my PhD called “From sequences to knowledge, improving and learning from sequence alignments”. This defence will be in English.

It will be happening on December 2nd at 1:30PM, at the Jussieu campus of Sorbonne université, in the Durand auditorium of the Esclangon building (cf. map below). You can download a calendar event here.
If you cannot come in person, no problem! The defence will also be attendable remotely with Microsoft Teams (link here)

It will be followed by some drinks and some food in the Atrium building of the university (cf. map below) and then by a party at a nearby bar: The Baker Street Pub (details below). If you cannot make it to the defence I hope you can at least make it to the bar!

Map of the campus

So that I can organize this event, please fill out this form, also available at the bottom of the page

Jury

The defence will be done in front of my PhD Jury, the composition of which is given below:

Brona Brejova Associate Professor Reviewer
Macha Nikolski Group Leader Reviewer
Élodie Laine Associate Professor Examinator
Olivier Gascuel Research Director Examinator
Jean-Philippe Vert Research Director Examinator
Paul Medvedev Associate Professor Invited Member
Rayan Chikhi Group Leader PhD Advisor

Schedule

The tentative schedule of the defence is as follows:

1:30 PM 2:15 PM PhD Presentation
2:15 PM 3 ~ 3:30 PM Jury Questions
3:30 PM 4 PM Jury Deliberation
4 PM 4:15 PM Jury Verdict
4:15 PM 6 PM Food and Drinks
6 PM ? Party 🎉

The after party

The party will be held at the Baker Street Pub at 9 rue des boulangers. It is very close to the university. We will probably head over there around 6PM. I have booked a private-ish space in the bar. We will be able to play our own music, however the spot is underground and there will be no internet so I need to elaborate a playlist. Here is a link to this shared playlist so you can add the songs you want to listen to:

My thesis

If you wish to read my manuscript it is available as a website (thesis.lucblassel.com), or as a pdf document. Good Luck!

Abstract

In this thesis we study two important problems in computational biology, one pertaining to primary analysis of sequencing data, and the second pertaining to secondary analysis of sequences to obtain biological insights using machine-learning. Sequence alignment is one of the most powerful and important tools in the field of computational biology. Read alignment is often the first step in many analyses like structural variant detection, genome assembly or variant calling. Long read sequencing technologies have improved the quality of results across all these analyses. They remain, however, plagued by sequencing errors and pose algorithmic challenges to alignment. A prevalent technique to reduce the detrimental effects of these errors is homopolymer compression, which targets the most common type of long-read sequencing error. We present a more general framework than homopolymer compression, which we call mapping-friendly sequence reductions (MSR). We then show that some of these MSRs improve the accuracy of read alignments across whole human, drosophila and E. coli genomes. Improvements in sequence alignment methods are crucial for downstream analyses. For instance, multiple sequence alignments are indispensable when studying resistance in viruses. With the ever growing quantity of annotated, high-quality multiple sequence alignments it has become possible and useful to study drug resistance in viruses with machine learning methods. We used a very large multiple sequence alignment of British HIV sequences to train multiple classifiers to discriminate between treatment-naive and treatment-experienced sequences. By studying important classifier features we identified resistance-associated mutations. We then removed known drug resistance associated signal from the data before training, keeping classifying power, and identified 6 novel resistance associated mutations. Further study indicated that these were most likely accessory in nature and linked to known resistance mutations.

Form