Cancer proteogenomics – identifying coding, long non-coding RNAs relevant in cancer

Loading...
Thumbnail Image
Files
Date
2024
Authors
Zaheed, Oza
Journal Title
Journal ISSN
Volume Title
Publisher
University College Cork
Published Version
Research Projects
Organizational Units
Journal Issue
Abstract
Long non-coding RNAs (lncRNAs) are RNA molecules exceeding 500 nucleotides that were traditionally considered non-functional and not translated (Mattick et al., 2023). However, recent advancements in next-generation sequencing have unveiled pivotal roles for lncRNAs in cellular processes, especially in disease contexts such as cancer. My work proposes that the non-coding transcriptome significantly impacts cancer cell biology, influencing aggressiveness and offering potential applications in patient diagnosis, disease stratification, and treatment. Cancer cells, characterised by altered metabolism and adaptations to cellular stress, may tip the balance between non-coding and coding functions. This shift could potentially result in the production of peptides from RNAs previously annotated as non-coding. Emerging evidence suggests that some of these 'non-coding' RNAs particularly lncRNAs may, in fact, encode proteins, often originating from small open reading frames (sORFs). Examples of these peptides generated from sORFs within non-coding transcripts span across species, from plants to humans, and many exhibit critical biological functions. My research focuses on identifying bifunctional RNAs that exhibit dual non-coding and coding functions in cancer. Leveraging RNA sequencing (RNA-seq) and ribosome profiling (Ribo-seq) datasets, I aim to challenge the conventional classification of 'noncoding' RNAs. My hypothesis posits that a single RNA molecule may serve both noncoding and coding roles, influenced by contextual cues and cellular conditions. To explore this hypothesis, I propose a multi-omic approach, aiming to identify potential peptides with pivotal roles in oncogenesis and cancer progression. This approach uncovered a total of 400 lncRNAs that were differentially expressed in malignant breast cancer cell lines. From this list of differentially expressed long non-coding RNAs, 64 candidate translated open reading frames were identified. By employing sequence analysis in combination with proteomic evidence and translated open reading frame curation, a scoring system was created to rank these candidates on the likelihood of producing a detectable microprotein. A total of nine parameters were assessed and fit into evidencebased categories including, translation, sequence information, and proteomics. This scoring system identified ten candidates most likely to produce a stable detectable microprotein with the three best scoring candidates arising from the lncRNAs ENSG00000253477, LINC02163, and MRPS30-DT. This integrated methodology holds promise for uncovering hidden coding potential within lncRNAs, offering novel insights into therapeutic targets and biomarkers. By examining the interplay between non-coding and coding functions in cancer, this research seeks to reshape our understanding of lncRNAs and their applications in cancer diagnosis and treatment.
Description
Keywords
Microproteins , Small open reading frames , Translation , lncRNA , Breast cancer , Ribosome profiling , Protein structure prediction
Citation
Zaheed Maheswaran, O. B. 2024. Cancer proteogenomics – identifying coding, long non-coding RNAs relevant in cancer. PhD Thesis, University College Cork.
Link to publisher’s version