Representing translation: concepts, methods, and resources for protein coding loci representation and ribosome profiling data analysis
Loading...
Files
Full Text E-thesis
Date
2024
Authors
Tierney, Jack
Journal Title
Journal ISSN
Volume Title
Publisher
University College Cork
Published Version
Abstract
The advent of the ribosome profiling (Ribo-Seq) sequencing technique has led to the genome-wide characterisation of translatomic complexity across the tree of life. The understanding of the extent to which translation occurs outside of annotated protein coding regions has been greatly expanded, as has the understanding of the role of translation dynamics in gene expression. As more and more Ribo-Seq data is made available and consequently more translated regions identified, the need for resources and methods for annotating, representing and analysing this translation grows. Current representation schemas all but ignore translational complexity and where it is accounted for, the reality of the gene's expression is often further obscured. The Ribo-Seq data used to uncover this translational complexity is often generated for independent investigations prior to its deposition in public depositories. The re-use of this data is essential for the annotation of translational complexity yet the need for elaborate data processing and lack of standardised metadata hinder this reuse at scale.
In this thesis, we introduce a set of interoperable frameworks and resources for the representation of eukaryotic translational complexity, and the increased accessibility of high-quality Ribo-Seq data in order to facilitate further investigations into translational complexity.
First, introduce the concept of a Ribosome Decision Graph (RDG) for the accurate representation of translatomic complexity. RDGs address the shortcomings of current annotation frameworks by representing translation events as branching points in a directed acyclic graph enabling the representation of the set of potential paths a ribosome may take as it traverses the mRNA. In order to facilitate the adoption of this concept, we proceed to introduce a comprehensive Python package that enables researchers to construct, visualise and analyse these graphs.
To address the barriers to large-scale reanalysis of public Ribo-Seq data, we introduce the RiboSeq Data Portal. This resource makes 14,840 Ribo-Seq samples from 969 studies across 96 different species/strains. This resource serves as a database of standardised metadata for each of these samples and also offers pre-processed data in various formats. This portal’s integration with the existing resources maintained by RiboSeq.Org enhances the exploration of Ribo-Seq data within the browser.
We also introduce RiboMetric a comprehensive command line application for the assessment of Ribo-Seq dataset properties. RiboMetric generates detailed sample reports as well as a set of metrics that quantify important Ribo-Seq data properties including, periodicity, uniformity and read length distribution. In combination with the data available in the RiboSeq Data Portal, these metrics enable large-scale comparisons of datasets enhancing the reproducibility of translatomic research.
Collectively, these advances provide robust foundations for future research into translatomic complexity in particular through the reanalysis of Ribo-Seq data. RDGs provide new perspectives on how translatomic can be represented and analysed while the RiboSeq Data Portal and RiboMetric make large-scale reanalysis of Ribo-Seq datasets with desired properties more accessible to the entire translatomic research community. This work collectively lays the groundwork for more comprehensive and reproducible investigations into translatomic complexity, potentially revolutionising our understanding of translation dynamics in gene regulation.
Description
Keywords
Translation , Molecular biology , Bioinformatics , Ribosome profiling , Ribo-Seq , Genomics , Genome annotation
Citation
Tierney, J. 2024. Representing translation: concepts, methods, and resources for protein coding loci representation and ribosome profiling data analysis. PhD Thesis, University College Cork.