Overview

Thanks to high-throughput biotechnologies, we are assisting a massive stacking of genomic, transcriptomic, proteomic and metabolomic data. Molecular biology is nowadays turning into a "data-rich" science. This unprecedented richness of raw data is meaningless until we cannot extract the interesting and useful "needles" of information from the big "haystack" of available data. This allows us to have at the same time both a "bird’s eye" and a "magnified" overview on a biological sample from a wide range of medically relevant disorders. "The answer is there, blowing in the bulk" (thanks Zimmerman 😃). However, this huge amount of raw data is of little use without groundbreaking bioinformatics and biostatistics methodologies able to disentangle, process, analyze and interpret them.

In my work, I strike a balance between two ways of doing science. On one hand I focus on the development and application of statistical and machine learning approaches for analyzing high-throughput and heterogeneous biomedical data. On the other hand, I package such approaches into modular, scalable and user-friendly tools, in order to share them with the scientific community.

Skills by Interest Area

  1. Next Generation Sequencing / Multi-Omics Data
    • Developing tailored analytics solutions for NGS datasets (WES, bulk RNAseq, scRNAseq, ChIP-Seq, CUT&RUN, CRISPR-screen)
    • Developing scalable and reproducible pipelines for the analysis of NGS datasets
    • Performing variant calling analysis to detect point and structural mutations
    • Performing clustering analysis to stratify patients on the basis of gene/protein expression profiles
    • Performing differential gene/protein expression and pathway enrichment analysis
    • Performing differential binding/accessibility analysis with ChIP-Seq/ATAC-Seq data
    • Quantifying chromatin signal metaprofiles
    • Programmatic configuration and browsing of genomic data via UCSC track hubs
    • Developing Shiny web applications to interactively explore multi-omics data
  2. Machine Learning
    • Developing ensemble methods for the functional annotation of gene products
    • Implementing workflow for executing machine learning experiments (hold-out, k-fold cross-validation)
    • Applying machine learning methods for the prediction of human gene-abnormal phenotype associations and for the prediction of protein function
    • Implementing ensemble methods in the R/CRAN software library (HEMDAG)
    • Implementing a Perl module for handling bio-ontologies and gene annotation files (obogaf-parser)
    • Developing an integrated R/Perl pipeline to build datasets for machine learning experiments (godata-pipe)
  3. Bio-molecular networks
    • Wrangling and analysis of biomolecular networks
    • Wrangling and analysis of heterogeneous ontology and interaction graphs (protein-coding and non-coding genes)