PhD Candidate: Gregory Tong
Traditional approaches to annotating the genome used a minimum open reading frame (ORF) length cutoff of 100 codons, resulting in the exclusion of small ORFs (smORFs) from gene annotations. Recent studies have shown these unannotated smORFs are actively translated, coding for a novel class of proteins called microproteins. Advancements in genomics, proteomics, and bioinformatic technology have provided researchers with tools to identify and prove that these smORFs can encode functional proteins, extending our understanding of the genome’s protein coding capacity. From the small number of microproteins that have been fully characterized, these novel proteins are involved in a wide range of biological processes and disease pathologies. To improve our understanding of human health and disease, it is important to comprehensively identify all microproteins that have a biological role. One of the primary methods for detecting microprotein coding smORFs is ribosome profiling (Ribo-seq), a method used to measure translation in vivo through deep sequencing of ribosome protected mRNA fragments. As a technique to detect translation of ORFs, Ribo-seq has been critical towards achieving a full view of the proteome. Challenges remain in using Ribo-seq for microprotein discovery, as the lack of standardization and reproducibility in smORF identification has bottlenecked microprotein characterization efforts. My dissertation presents research that explores how to optimally combine Ribo-seq with bioinformatic tools to confidently identify microprotein coding smORFs. From these results, I applied a computational workflow to identify microproteins expressed in the context of pancreatic ductal adenocarcinoma (PDAC) and aging.