Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data

dc.check.date2018-06-30
dc.check.infoAccess to this article is restricted until 12 months after publication by request of the publisher.en
dc.contributor.authorMcParland, D.
dc.contributor.authorPhillips, Catherine M.
dc.contributor.authorBrennan, L.
dc.contributor.authorRoche, H. M.
dc.contributor.authorGormley, I. C.
dc.contributor.funderSixth Framework Programmeen
dc.contributor.funderScience Foundation Irelanden
dc.date.accessioned2018-05-25T10:24:36Z
dc.date.available2018-05-25T10:24:36Z
dc.date.issued2017-06-30
dc.date.updated2018-05-25T08:57:19Z
dc.description.abstractThe LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes (healthy' and at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition.en
dc.description.sponsorshipSixth Framework Programme (LIPGEN Grant Number: FOOD-CT-2003-505944); Science Foundation Ireland (SFI/14/JPI_HDHL/B3075)en
dc.description.statusPeer revieweden
dc.description.versionAccepted Versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.citationMcParland, D., Phillips, C. M., Brennan, L., Roche, H. M. and Gormley, I. C. (2017) 'Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data', Statistics in Medicine, 36(28), pp. 4548-4569. doi:10.1002/sim.7371en
dc.identifier.doi10.1002/sim.7371
dc.identifier.endpage4569en
dc.identifier.issn0277-6715
dc.identifier.issn1097-0258
dc.identifier.issued28en
dc.identifier.journaltitleStatistics in Medicineen
dc.identifier.startpage4548en
dc.identifier.urihttps://hdl.handle.net/10468/6192
dc.identifier.volume36en
dc.language.isoenen
dc.publisherJohn Wiley & Sons, Inc.en
dc.relation.projectinfo:eu-repo/grantAgreement/SFI/SFI Research Frontiers Programme (RFP)/09/RFP/MTH2367/IE/Model-based Statistical Methods for Mixed-Mode Metabolomic Data./en
dc.relation.projectinfo:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/en
dc.relation.projectinfo:eu-repo/grantAgreement/SFI/SFI Principal Investigator Programme (PI)/11/PI/1119/IE/Dietary fatty acids: impact on inflammasome driven adipose inflammation and insulin resistance _ novel therapeutic targets/en
dc.rights© 2017, John Wiley & Sons, Ltd. This is the peer reviewed version of the following article: McParland, D., Phillips, C. M., Brennan, L., Roche, H. M. and Gormley, I. C. (2017) 'Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data', Statistics in Medicine, 36(28), pp. 4548-4569. doi:10.1002/sim.7371, which has been published in final form at https://doi.org/10.1002/sim.7371. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.en
dc.subjectClusteringen
dc.subjectMixed dataen
dc.subjectPhenotypic dataen
dc.subjectSNP dataen
dc.subjectMetabolic syndromeen
dc.titleClustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic dataen
dc.typeArticle (peer-reviewed)en
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
McParlandEtAl.pdf
Size:
385.05 KB
Format:
Adobe Portable Document Format
Description:
Accepted Version
Loading...
Thumbnail Image
Name:
sim7371-sup-0001-supplementary.pdf
Size:
630.82 KB
Format:
Adobe Portable Document Format
Description:
Supporting Information
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.71 KB
Format:
Item-specific license agreed upon to submission
Description: