Thanks to high-throughput biotechnologies, we are assisting a massive stacking of genomic, transcriptomic, proteomic and metabolomic data. Molecular biology is nowadays turning into a "data-rich" science. This unprecedented richness of raw data is meaningless until we cannot extract the interesting and useful "needles" of information from the big "haystack" of available data. This allows us to have at the same time both a "bird’s eye" and a "magnified" overview on a biological sample from a wide range of medically relevant disorders. "The answer is there, blowing in the bulk" (thanks Zimmerman 😃). However, this huge amount of raw data is of little use without groundbreaking bioinformatics and biostatistics methodologies able to disentangle, process, analyze and interpret them.
In my work, I strike a balance between two ways of doing science. On one hand I focus on the development and application of statistical and machine learning approaches for analyzing high-throughput and heterogeneous biomedical data. On the other hand, I package such approaches into modular, scalable and user-friendly tools, in order to share them with the scientific community.