Year 2019-2020

The seminars take place on Tuesdays at 2 pm in the CGFB conference room (unless otherwise specified)

October 2019


8/10 Laurent Brehelin (LIRMM, Montpellier)

Probing transcriptional regulation with statistical models

Gene expression in Eukaryotes is orchestrated by distinct regulatory mechanisms to ensure a wide variety of cell types and functions. While these regulations include actors as different as transcription factors (TFs), histone marks, or chromatin structure, the DNA sequence itself is invariably involved in the different processes. Hence, a key challenge in regulatory genomics is to decipher the links between gene regulation and DNA sequence. In this talk, I will present two attempts to this
problem based on statistical machine learning and feature selection approaches.

In the first work [1], we probe sequence-level instructions for gene expression and develop a method to explain mRNA levels based solely on nucleotide features. Our method positions nucleotide composition as a critical component of gene expression. In the second one [2], we study TF combinations involved in the binding of a target TF in a particular cell type. We show that TF combinations are different between promoters and enhancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs.



22/10 Guillaume Bernard (Sorbonne university, MNHN, Paris)

Next-generation phylogenomics: alignment-free approaches, sequence similarity networks and more.

From the 2000’s with the development of Next Generation Sequencing (NGS), biologists have been able to sequence microbes directly from the environment. The ‘microbial dark matter’ represented in metagenomes could contain a huge amount of new information about Earth’s microbial diversity and the origins of life, as well as solutions to medical problems or the adaptation to climate changes. NGS brought a deluge of collected sequence data in which we found a huge diversity and uncharacterized organisms that weren’t cultivated in the laboratory. Alternative approaches to the classical phylogenetic methods based on multiple sequence alignment (MSA), such as sequence similarity networks (SSN) and alignment-free (AF) methods, have been increasingly used in evolutionary analyses to cope with the increasingly large amount of data. These latter approaches are faster and more scalable than their MSA- based counterpart, and can be applied to a broader range of data (sequencing reads, whole genomes, etc). I will start with a brief introduction to the AF approaches followed by an overview of the different methods available. Next, I will show the network-based methods and their applications. Finally, I will present a novel approach combining the SSN and the AF methods to quickly identify gene/proteins of interest in metagenomic data and infer proxies of phylogenies, robust to long branch attraction, when the data are too large or divergent to perform a MSA.


10/12 Laura Cantini (IBENS, ENS, Paris)



Year 2018-2020



20/11 Clovis Galliez (Université de Grenoble) "Making sense of the metagenomics mixture: identifying bacterial hosts from phage sequences and binning billions of contigs"

29/11 Antonio Marco (University of Essex, UK), "On sex, mothers and microRNA"


8/01 Warren Francis (University of Southern Denmark) "Comparative genomics and the nature of placozoan species"


19/02 Magali Champion (Université Paris Descartes), "AMARETTO: Multi-omics data fusion for cancer data "


26/03 Julien Chiquet (AgroParisTech), "A collection of Poisson lognormal models for multivariate analysis of count data"


7/05 Florian Thibord (UPMC Université Paris 6), "Alignement des données miRseq"

21/05 Eduardo Rocha (Institut Pasteur), "Horizontal gene transfer: from acquisition to functional innovation"