The seminars take place on Tuesdays at 2 pm in the CGFB conference room (unless otherwise specified)

Next Seminar :

Tuesday 31 May 2022 - Kateryna Makova  (The Makova Lab at Penn State)

Location: IBGC conference room at 2PM

Title: Explaining and predicting disease occurrence from omics data


With advances in current technologies, we can now generate genomics, microbiomics, metabolomics, and epigenomics data, which may help us understand the patterns of disease occurrence. As an example, I will present various omics datasets related to childhood obesity. I will also discuss applications of Functional Data Analysis, a statistical method, to the analysis of these omics data.

A native of Ukraine, Kateryna Makova received her PhD from Texas Tech University, where she studied the genetic consequences of the Chernobyl Nuclear Power Plant accident. She then completed her postdoctoral studies at the University of Chicago, where she investigated differences in mutation rates between males and females, among other topics.

She has been a Professor in the Department of Biology at the Pennsylvania State University since 2003. Her laboratory conducts research in evolutionary and medical genomics. Current topics of interest include sex chromosome evolution, evolution of non-B DNA, mitochondrial DNA evolution, regional variation in mutation rates, and childhood obesity. The research in Dr Makova’s laboratory is highly interdisciplinary and collaborative. The group collaborates with statisticians, computer scientists, and biochemists.

Upcoming seminars


Past seminars

April 12 2022 - Samuel Chaffron  (TBD)

Location: Centre de Génomique Fonctionnelle (salle des conférences, 1ème étage)

Title: Plankton network models reveal mechanisms shaping microbial community assembly in the global ocean


Marine plankton form complex communities of interacting organisms at the base of the food web, which sustain oceanic biogeochemical cycles and help regulate climate. Understanding the mechanisms controlling their assembly and sustaining their activities is a major challenge in microbial ecology. Though global surveys are starting to reveal ecological drivers underlying planktonic community structure, and predicted climate change responses, it is unclear how community-scale species interactions are constrained, and how they will be affected by climate change. By leveraging Tara Oceans meta-omics data, plankton community network models were integrated with niche modelling to reveal biome-specific plankton community responses to environmental change, and forecast most affected lineages within each community. To go beyond statistical models, genome-resolved community networks enable to model and predict metabolic cross-feedings within prokaryotic assemblages. This revealed a higher potential for interactions within predicted communities and pointed towards specific metabolic cross-feedings shaping plankton microbial communities. Integrated ecological and metabolic models provide a useful framework to assess community structure and organismal interactions, and reveal important mechanisms shaping natural microbial communities in our changing ocean.

March 22 2022 - Laurent Tichit, IMM, Université d’Aix-Marseille

Location: Centre de Génomique Fonctionnelle (salle des conférences, 1ème étage)

Title: Random Walk with Restart on Multilayer Biological Networks


Random walk with restart is the state-of-the-art guilt-by-association approach. It explores the network vicinity of gene/protein seeds to study their functions, based on the premise that nodes related to similar functions tend to lie close to each other in the networks.
We extended the RWR algorithm to multilayer networks. The walk can now explore different layers of physical and functional interactions between genes and proteins. It can also jump to a network containing different sets of edges and nodes, such as phenotype similarities between diseases. We devised a leave-one-out cross-validation strategy to evaluate the algorithms abilities to predict disease-associated genes. Finally, we applied the algorithm to predict candidate genes for two genetic diseases: the Wiedemann-Rautenstrauch progeroid syndrome and the SHORT syndrome.

March 1 2022 - Emilien Peltier Unité de recherche Œnologie, ISVV / INRAE

Location: Centre de Génomique Fonctionnelle (salle des conférences, 1ème étage)

Title: Inheritance, phenotypic variation, and missing heritability hidden behind the mitochondrial genome


In the yeast species S. cerevisiae, mitochondrial transmission is biparental. The cross of two parental strains generates a heteroplasmic cell containing both parental mitochondrial genomes (mtDNA) and after few vegetative divisions only one mitochondrial genome is retained, leading to homoplasmy. Therefore, the transmitted mtDNA can be either from one of the two parents or can also be a recombined genome from the two parentals DNA. Potential bias in transmission toward specific mitotype or the level of recombination was never explored at the species level. This aspect was investigated by selecting twenty parental strains representative of the S. cerevisiae population and by the analysis of the mitochondrial DNA transmission of their 190 possible pairwise cross. A Bulk whole genome sequencing approach was used to follow mtDNA transmission in large progeny unveiling transmission deviation at the whole genome level or at the gene level. The pairwise experimental design allowed to see if these deviations were cross or strain specific and to rank mitotypes or haplotypes according to their fitness. Intron content was found to be a major factor impacting mtDNA transmission through intron homing mechanism. Intron homing efficiency according to intron and cross was analyzed and revealed strong interaction between introns. Altogether these results shed light on the factors impacting mtDNA transmission at the species level in S. cerevisiae.

January 18 2022 - Clémence Frioux chargée de recherche INRIA équipe Pleiade (

Location: Centre de Génomique Fonctionnelle (salle des conférences, 1ème étage)

Title: Discrete modelling of metabolism: from individual organisms to microbial ecosystems


Genome-scale metabolic networks gather the functional potential associated to the genome of an organism. Combined to mathematical modelling, they permit predicting the metabolic response of a species in its environment. There has been many developments in tools and software dedicated to the reconstruction and analyses of such models. The focus of this talk is the discrete modelling of metabolic producibility using a Boolean approach, and its applications. I will illustrate how such approach complements quantitative methods by providing insights into the metabolism of organisms, even from automatically-reconstructed metabolic networks. I will show how such approaches can scale up to large communities of organisms, and take advantage of metagenomic data in order to screen the metabolic potential of the microbiota and identify key members among them.

8 Juin 2021 - Stéphane Cauet INRAE - CNRGV (

Abstract :

Améliorer l’assemblage et la comparaison des génomes à l’aide des cartes optiques

Les progrès des technologies de caractérisation des génomes permettent aujourd’hui de produire rapidement et pour des couts réduits des séquences assemblées d’excellente qualité. Cependant, les génomes de plantes demeurent plus difficiles à appréhender que ceux des animaux ou des microorganismes. En effet ils sont très complexes, combinant des pourcentages élevés d'éléments répétés, des variations de niveaux de ploïdie et, chez certaines espèces, de très grandes tailles. De plus, les plantes présentent une forte variabilité intra-spécifique qui implique qu’un génome de référence par espèce est insuffisant pour étudier la diversité et comprendre des processus biologiques complexes tels que la résistance aux stress biotiques ou abiotiques. Or, relier ces caractères aux régions génomiques qui les gouvernent est essentiel pour comprendre les déterminismes génétiques des phénotypes d’intérêt.
C’est dans ce but que le Centre National de Ressources Génomiques Végétales (CNRGV), parallèlement à son activité de centre de ressources biologiques, propose des outils et des stratégies innovantes pour la recherche en génomique végétale. Le CNRGV utilise son savoir-faire en extraction et manipulation d’ADN de haut poids moléculaire pour développer des méthodes visant à caractériser les génomes et capturer des régions génomiques d’intérêt en utilisant les propriétés du système CRISPR/Cas9 (méthodes CATCH et pull-CATCH). Lors de cette présentation, je vous introduirai rapidement le CNRGV et ses services, puis, vous exposerai les tendances actuelles pour construire un génome de référence d'une espèce végétale au travers de différents projets auxquels nous collaborons. Je ferai en particulier un focus sur le principe et l’intérêt du système de production de cartes optiques Saphyr® (Bionano genomics) pour améliorer l’assemblage des séquences (Scaffolding) et comparer des génotypes (analyse de variations structurales).

16 Mars 2021 - Misbah Razzac, Inserm,  équipe Vintage-U1219

Abstract :

An artificial neural network approach integrating plasma proteomics and genetic data identifies PLXNA4 as a new susceptibility locus for pulmonary embolism.

Pulmonary embolism is a severe and potentially fatal condition characterized by the presence of a blood clot (or thrombus) in the pulmonary artery. Pulmonary embolism is often the consequence of the migration of a thrombus from a deep vein to the lung. Together with deep vein thrombosis, pulmonary embolism forms the so-called venous thromboembolism, the third most common cardiovascular disease, and its prevalence strongly increases with age. While pulmonary embolism is observed in ~40% of patients with deep vein thrombosis, there is currently limited biomarkers that can help to predict which patients with deep vein thrombosis are at risk of pulmonary embolism.  To fill this need, we implemented two hidden-layers artificial neural networks (ANN) on 376 antibodies and 19 biological traits measured in the plasma of 1388 DVT patients, with or without PE, of the MARTHA study. We used the LIME algorithm to obtain a linear approximation of the resulting ANN prediction model. As MARTHA patients were typed for genotyping DNA arrays, a genome-wide association study (GWAS) was conducted on the LIME estimate. Detected single nucleotide polymorphisms (SNPs) were tested for association with PE risk in MARTHA. Main findings were replicated in the EOVT study composed of 143 PE patients and 196 DVT only patients. The derived ANN model for PE achieved an accuracy of 0.89 and 0.79 in our training and testing sets, respectively. A GWAS on the LIME approximate identified a strong statistical association peak (p = 5.3x10-7) at the PLXNA4 locus, with lead SNP rs1424597 at which the minor A allele was further shown to associate with an increased risk of PE (OR = 1.49 [1.12 – 1.98], p = 6.1x10-3). Further association analysis in EOVT revealed that, in the combined MARTHA and EOVT samples, the rs1424597-A allele was associated with increased PE risk (OR = 1.74 [1.27 – 2.38,  p = 5.42x10-4) in patients over 37 years of age but not in younger patients (OR = 0.96 [0.65 – 1.41], p = 0.848). 

2 Mars 2021 - Corentin Dechaud (IGFL), équipe Genomique Evolutive des Poissons (

Abstract :

Assessing the role of transposable elements in the control of sexual genes in teleost fish

In teleost fish, sexual reproduction modes and sexual gene regulatory networks are highly variable. Sex can be determined either environmentally or genetically and is controlled by different genes depending on the species investigated. Sexual development and maintenance also appears variable in this clade. Possible genetic determinants at the origin of this diversity are transposable elements. Transposable elements are endogenous DNA sequences able to insert, and by this way to copy themselves in genomes. Even if they are often neutral or deleterious for their host, transposable elements can also be a source of evolutionary innovations, thanks to the regulatory sequences, such as transcription factor binding sites, they carry and spread in genomes. Their diversity in fish genomes constitutes a reservoir of numerous ready-to-use regulatory sequences that could be involved in the fast evolution of some gene regulatory networks. To test this hypothesis, we used RNA sequencing data from male and female gonads from teleost fish species of the genus Oryzias. We looked for transposable element families enriched in the vicinity of sex-biased genes. Doing so we were in particular able to detect different candidate families over-represented in the 5’ untranslated region of testis-biased genes. We focused on one of these TE families and showed that it harbors transcription factor binding sites for transcription factors involved in sexual function. This work brings new insights into the role of transposable elements in the fast evolution of gene regulatory networks and is paving the way for future functional studies.

February 9, 2021 - Samuel Chaffron (University of Nantes)

8 Décember 2020 - Jean Delmotte, University of Montpellier (IHPE, UMR 5244), France

Abstract :

Phylogeography, genetic diversity and connectivity of Ostreid herpesvirus-1 population in France

Recurrent mortalities have been affecting juvenile Pacific oysters (Crassostrea gigas) for more than 30 years. Among the pathogens involved, the preponderant role of the Ostreid herpesvirus 1 (OsHV-1) virus in the mortality syndrome called “Pacific Oyster Mortality Syndrome” (POMS) has recently been demonstrated. The OsHV-1 epidemics in oyster farming areas have made this virus a major threat to the oyster industry. However, genomic epidemiology in certain regions at risk, in particular in France, remains limited. We report 21 OsHV-1 genomes generated using high-throughput sequencing during mass mortality episodes in 3 regions of the French coast. Using new bioinformatics methodology and adopting a three-set genomic variation analysis strategy, we reveal the connectivity of OsHV-1 viral population in France. The main source is probably the Marenne d'Oléron area, the viruses are then introduced into other shellfish-growing areas, probably following the transfer of oyster spat. The spatial heterogeneity of the transmission of OsHV-1 calls into question the surveillance of malacoherpesviruses and disease control measures.

24 November 2020 - Tiffany Delhomme (IRB Barcelona, Spain)

Abstract :

Drawing the landscape of genetic mutations from “hard” data analysis with needlestack and hyperstack.

Modern genomics has a great potential in applied cancer sciences, and currently is honouring his promises notably through the development of Next Generation Sequencing (NGS). This technique enables the identification of DNA sequences for hundreds or even thousands of individuals with reasonable costs and experiment times, and is applied recently to the identification of single-cell genomes. However, it is prone to errors of two types: from sequencing, and from amplification in the case of single-cell sequencing. In a first part, I will present needlestack, a new bioinformatics tools that can efficiently detect somatic mutations from NGS data, and I will present some of its applications in early cancer detection. In a second part, I will introduce hyperstack, an adaptation of needlestack for the reconstruction of mutational profiles from single-cell data. Hyperstack integrates a step of machine learning in order to estimate the amplification errors, in addition to sequencing errors detected by needlestack. I will finally present an application of hyperstack in a case of lowcoverage single-cell DNA data.

11 Février 2020 - Anais Baudot (Marseille Medical Genetics Institute, Aix-Marseille Université )

Mining networks to study rare and common diseases

Abstract :

Networks are scaling-up the analysis of gene and protein functions, thereby offering new avenues to study the diseases in which these macromolecules are involved. I will discuss the exploration of -omics networks containing thousands of physical and functional interactions between genes and proteins. In particular, we now focus on multiplex networks, i.e., networks composed of layers containing the same nodes but different interaction categories, such as protein-protein interactions, molecular complexes or correlations of expression.

We develop algorithms (e.g., community detections, random walks) to explore these large and complex biological networks, integrate information (e.g., expression), and mine the functional knowledge they contain. I will show how we use these tools to study rare and common genetic diseases, in particular premature aging diseases and diseases-disease comorbidity relationships.

Associated publications

  • The DREAM Module Identification Challenge Consortium, Choobdar S, Ahsen ME, Crawford J, Tomasoni M, Fang T, et al. Assessment of network module identification across complex diseases. Nature Methods. 2019 Sep;16(9):843–52.
  • Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, et al. Random Walk with Restart on Multiplex and Heterogeneous Biological Networks. Bioinformatics. 2018 Jul 18;
  • Ibáñez K, Boullosa C, Tabarés-Seisdedos R, Baudot A, Valencia A. Molecular Evidence for the Inverse Comorbidity between Central Nervous System Disorders and Cancers Detected by Transcriptomic Meta-analyses. Horwitz MS, editor. PLoS Genetics. 2014 Feb 20;10(2):e1004173.


4 Février 2020 - Alexandra Calteau (LABGeM bioinformatics team of the UMR 8030 Genomics Metabolics, research structure of Genoscope )

MicroScope: an integrated platform for the annotation and exploration of microbial gene functions through genomic, pangenomic and metabolic comparative analysis
Abstract :
Large-scale genome sequencing and the increasingly massive use of high-throughput approaches produce a vast amount of new information that completely transforms our understanding of thousands of microbial species. However, despite the development of powerful bioinformatics approaches, full interpretation of the content of these genomes remains a difficult task. Launched in 2005, the MicroScope platform ( has been under continuous development and provides analysis for prokaryotic genome projects together with metabolic network reconstruction and post-genomic experiments allowing users to improve the understanding of gene functions. Recently new tools and pipeline have been developed to perform comparative analyses on hundreds of genomes based on pangenome graphs. 
To date, MicroScope contains data for >12 300 microbial genomes, part of which are manually curated and maintained by microbiologists (>4700 personal accounts in January 2020). The platform enables collaborative work in a rich comparative genomic context and improves community-based curation efforts.


22 Octobre 2019  - Guillaume Bernard (Sorbonne university, MNHN, Paris)

Next-generation phylogenomics: alignment-free approaches, sequence similarity networks and more.


8 Octobre 2019 - Laurent Brehelin (LIRMM, Montpellier)

Probing transcriptional regulation with statistical models


21 Mai 2019 - Eduardo Rocha (Institut Pasteur)

Horizontal gene transfer: from acquisition to functional innovation


7 Mai 2019 - Florian Thibord (UPMC Université Paris 6)

Alignement des données miRseq


26 Mars 2019 - Julien Chiquet (AgroParisTech)

A collection of Poisson lognormal models for multivariate analysis of count data


19 Février 2019 - Magali Champion (Université Paris Descartes)

AMARETTO: Multi-omics data fusion for cancer data


8 Janvier 2019 - Warren Francis (University of Southern Denmark)

Comparative genomics and the nature of placozoan species


29 Novembre 2018 - Antonio Marco (University of Essex, UK), TBA

On sex, mothers and microRNA


11 Novembre 2018 - Clovis Galliez (Université de Grenoble)

Making sense of the metagenomics mixture: identifying bacterial hosts from phage sequences and binning billions of contigs