Overview

Thanks to high-throughput biotechnologies, we are assisting to a massive stacking of genomic, transcriptomic, proteomic and metabolomic data. Molecular biology is nowadays turning into a "data-rich" science. This unprecedented richness of raw data is meaningless until we cannot extract the interesting and useful "needles" of information from the big "haystack" of available data. This allow to us to have at the same time both a "bird’s eye" and a "magnified" overview on a biological sample from a wide range of medically relevant disorders. "The answer is there, blowing in the bulk" (thanks Zimmerman 😃). However, this huge amount of raw data is of little use without groundbreaking bioinformatics and biostatistics methodologies able to disentangle, process, analyze and interpret them.

In my research, I would like to strike a balance between two ways of doing science. On one hand I would like to focus on the design, development and application of novel statistical and machine learning approaches for analyzing heterogeneous and high-dimensional biomedical data. On the other hand, I would like to package such approaches into modular, scalable and user-friendly software tools, in order to make them available to the scientific community at large (have a look at my Software page).

Bioinformatics vs Computational Biology

Even if Bioinformatics and Computational Biology are often considered synonymous, I believe they are conceptually separated and it is worth distinguishing them.

When I design new methodologies I am involved in a Computer Science activity. I design algorithms not to solve just a single specific problem, but a class of similar problems in Medicine or Molecular Biology.

When I use my software (or those of others) to answer biomedical questions, I am doing science and I am making inferences in the field of Molecular Biology or Medicine.

Anyway, despite the difference between Bioinformatics and Computational Biology sounds apparently easy and clear, I believe that dealing with complex computational issues without losing sight of the medical/biological point of view, is simultaneously challenging and captivating!!

Research Interests

During my PhD, I focused on the analysis and construction of complex biomolecular networks and on design and implementation of output-structured learning algorithms for gene/protein function prediction and for the discovery of novel associations between human gene and abnormal phenotype. More in general, my research lines can be grouped as follow:

  1. BIO-MOLECULAR NETWORKS
    analysis and construction of complex biomolecular networks by making use of database such as STRING, BioGRID and UniProt;
  2. FLAT LEARNING METHODS
    application of both supervised and semi-supervised learning methods to make "flat" predictions, i.e. predictions that do not consider the hierarchical structure of the label space;
  3. OUTPUT STRUCTURED LEARNING
    design and software implementation of hierarchical ensemble methods (HEMs) for the prediction of protein function (an open fundamental challenge in molecular biology) and for the prediction of human gene-abnormal phenotype associations (a crucial step to discover novel genes associated to Mendelian diseases). The HEMs' predictions are consistent with the ontology predictions, because they always obey to the "true path rule", the logical and biological rule that governs the internal coherence of the bio-ontologies, such as the Gene Ontology (GO) and the Human Phenotype Ontology (HPO).
  4. NEXT GENERATION SEQUENCING
    application of computational tools for the analysis of genetic variants in sequencing data, such as copy number, single nucleotide variants, INDELs and the analysis of next-generation ChIP/RNA-Seq data sequencing data.

Interdisciplinary Research

The pie chart displayed below groups my publications by subject area. From this graph it is clear how my research interests are equally spanned over the areas of Computer Science, Mathematics and Biomolecular Science (Biochemistry, Genetics and Molecular Biology). The pie graph shown below was created by Plotly, whereas the partition of publications by subject area is taken from Scopus - Documents by subject area.