Seq-ing improved gene expression estimates from microarrays using machine learning

dc.contributor.authorKorir, Paul K.
dc.contributor.authorGeeleher, Paul
dc.contributor.authorSeoighe, Cathal
dc.date.accessioned2016-01-14T12:59:06Z
dc.date.available2016-01-14T12:59:06Z
dc.date.issued2015-09-04
dc.description.abstractBACKGROUND: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samples deposited in public repositories. RESULTS: We propose a novel approach to microarray analysis that attains many of the advantages of RNA-Seq. This method, called Machine Learning of Transcript Expression (MaLTE), leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and RNA-Seq transcript expression estimates. We trained MaLTE on data from the Genotype-Tissue Expression (GTEx) project, consisting of Affymetrix gene arrays and RNA-Seq from over 700 samples across a broad range of human tissues. CONCLUSION: This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible. This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated.en
dc.description.statusPeer revieweden
dc.description.versionPublished Versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.articleid286
dc.identifier.citationKORIR, P. K., GEELEHER, P. & SEOIGHE, C. 2015. Seq-ing improved gene expression estimates from microarrays using machine learning. BMC Bioinformatics, 16:286, 1-11. http://dx.doi.org/10.1186/s12859-015-0712-zen
dc.identifier.doi10.1186/s12859-015-0712-z
dc.identifier.endpage11en
dc.identifier.issn1471-2105
dc.identifier.issued1en
dc.identifier.journaltitleBMC Bioinformaticsen
dc.identifier.startpage1en
dc.identifier.urihttps://hdl.handle.net/10468/2183
dc.identifier.volume16en
dc.language.isoenen
dc.publisherBiomed Central Ltd.en
dc.rights© 2015 Korir et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons. org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0en
dc.subjectRNA-Seqen
dc.subjectMicroarrayen
dc.subjectMachine learningen
dc.subjectStatistical learningen
dc.subjectArtificial intelligenceen
dc.subjectBioassayen
dc.subjectDecision treesen
dc.subjectFluorescence intensitiesen
dc.subjectMicroarray analysisen
dc.subjectMicroarray dataen
dc.subjectPublic repositoriesen
dc.subjectStatistical learningen
dc.subjectTissue expressionen
dc.subjectTranscript levelen
dc.subjectGene expressionen
dc.subjectTissue expression levelsen
dc.titleSeq-ing improved gene expression estimates from microarrays using machine learningen
dc.typeArticle (peer-reviewed)en
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
PKK_Seq-ingPV2015.pdf
Size:
1.62 MB
Format:
Adobe Portable Document Format
Description:
Published Version
Loading...
Thumbnail Image
Name:
PKK_SeqPV2015_add_ file 1.docx
Size:
13.15 KB
Format:
Microsoft Word XML
Description:
Additional file 1
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.71 KB
Format:
Item-specific license agreed upon to submission
Description: