The seminars take place on Tuesdays at 2 pm in the CGFB conference room (unless otherwise specified)
22 Octobre 2019 - Guillaume Bernard (Sorbonne university, MNHN, Paris)
Next-generation phylogenomics: alignment-free approaches, sequence similarity networks and more.
From the 2000’s with the development of Next Generation Sequencing (NGS), biologists have been able to sequence microbes directly from the environment. The ‘microbial dark matter’ represented in metagenomes could contain a huge amount of new information about Earth’s microbial diversity and the origins of life, as well as solutions to medical problems or the adaptation to climate changes. NGS brought a deluge of collected sequence data in which we found a huge diversity and uncharacterized organisms that weren’t cultivated in the laboratory. Alternative approaches to the classical phylogenetic methods based on multiple sequence alignment (MSA), such as sequence similarity networks (SSN) and alignment-free (AF) methods, have been increasingly used in evolutionary analyses to cope with the increasingly large amount of data. These latter approaches are faster and more scalable than their MSA- based counterpart, and can be applied to a broader range of data (sequencing reads, whole genomes, etc). I will start with a brief introduction to the AF approaches followed by an overview of the different methods available. Next, I will show the network-based methods and their applications. Finally, I will present a novel approach combining the SSN and the AF methods to quickly identify gene/proteins of interest in metagenomic data and infer proxies of phylogenies, robust to long branch attraction, when the data are too large or divergent to perform a MSA.
Seminars to come
10 December 10 2019 Laura Cantini (IBENS, ENS, Paris)
February 11 2020 - Anais Baudot (Aix-Marseille Université)
October 82019 - Laurent Brehelin (LIRMM, Montpellier)
Probing transcriptional regulation with statistical models
Gene expression in Eukaryotes is orchestrated by distinct regulatory mechanisms to ensure a wide variety of cell types and functions. While these regulations include actors as different as transcription factors (TFs), histone marks, or chromatin structure, the DNA sequence itself is invariably involved in the different processes. Hence, a key challenge in regulatory genomics is to decipher the links between gene regulation and DNA sequence. In this talk, I will present two attempts to this
problem based on statistical machine learning and feature selection approaches.
In the first work , we probe sequence-level instructions for gene expression and develop a method to explain mRNA levels based solely on nucleotide features. Our method positions nucleotide composition as a critical component of gene expression. In the second one , we study TF combinations involved in the binding of a target TF in a particular cell type. We show that TF combinations are different between promoters and enhancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs.