Interrogating annotated protein coding regions for hitherto undetected translation

Fedorova, Alla
University College Cork
Ribosome profiling (Ribo-seq) is a technique that allows to capture ribosome protected fragments and sequence them. This powerful method enables discovery of not yet annotated proteoforms and translated open reading frames (ORFs), even ones that are hidden in annotated protein coding regions. Here we employed the Ribo-seq data together with comparative genomics analysis in order to discover non-AUG initiated proteoforms derived via alternative translation start sites that are in-frame with annotated starts. Production of such non-AUG proteoforms can be split into two scenarios. First, some nonAUG proteoforms are generated as alternative proteoforms in addition to annotated AUG- initiated ones. This phenomenon is called PANTs - Proteoforms with Alternative N-termini. The second scenario is when a non-AUG codon is used exclusively as the translation start for the generation of the main protein product from mRNA. In addition to discovery of non-AUG proteoforms, we rebuilt and upgraded an instance of the Galaxy platform for processing Ribo-seq data called RiboGalaxy. This update enabled prediction of novel translated ORFs from raw Ribo-seq reads by using only an internet browser with no need of local software. This update made working with Ribo-seq data more accessible to the scientific community. Chapter 1 is an introductory chapter which describes Proteoforms with Alternative N termini - PANTs. In particular, it covers different sources of PANTs, their functions and methods for their discovery. Chapter 2 covers the development of a pipeline for detection of non-AUG N-terminally extended proteoforms in the human genome which constitutes a phylogenetic approach and Ribo-seq-based approach. It also narrates the discovery of novel non AUG N-terminal extensions using the aforementioned pipeline and an attempt to describe the functionality of those non-AUG N-termini. Chapter 3 describes the phenomenon of exclusive non-AUG initiation when only non AUG initiated proteoform is generated from mRNA unlike Proteoforms with Alternative N-termini (PANTs) when both non-AUG and AUG proteoforms are generated from the same mRNA. Reported proteoforms were analysed and novel candidates predicted using Ribo-seq data. Chapter 4 reports the development of an update of RiboGalaxy - an interactive user friendly online platform for the processing Ribo-seq data which covers all the steps from preprocessing raw reads and quality control to transcriptomic and genomic alignments which then can be visualised and analysed in Trips-viz and GWIPS-viz - transcriptomic and genomic browsers for ribosome profiling data which altogether comprise the resource. This platform enables preparing ribosome profiling data for subsequent detection of translated ORFs in Trips-viz. This update includes its backend moving to configuration manager (ansible), updating tools, their dependencies and reference indices and adding novel tools that allow to prepare files for easy upload to GWIPs-viz and Trips-viz.
Ribosome profiling , Ribo-seq , NonAUG initiation , Non-canonical translation , Evolution
