Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data

Show simple item record

dc.contributor.author McParland, D.
dc.contributor.author Phillips, Catherine M.
dc.contributor.author Brennan, L.
dc.contributor.author Roche, H. M.
dc.contributor.author Gormley, I. C.
dc.date.accessioned 2018-05-25T10:24:36Z
dc.date.available 2018-05-25T10:24:36Z
dc.date.issued 2017-06-30
dc.identifier.citation McParland, D., Phillips, C. M., Brennan, L., Roche, H. M. and Gormley, I. C. (2017) 'Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data', Statistics in Medicine, 36(28), pp. 4548-4569. doi:10.1002/sim.7371 en
dc.identifier.volume 36 en
dc.identifier.issued 28 en
dc.identifier.startpage 4548 en
dc.identifier.endpage 4569 en
dc.identifier.issn 0277-6715
dc.identifier.issn 1097-0258
dc.identifier.uri http://hdl.handle.net/10468/6192
dc.identifier.doi 10.1002/sim.7371
dc.description.abstract The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes (healthy' and at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. en
dc.description.sponsorship Sixth Framework Programme (LIPGEN Grant Number: FOOD-CT-2003-505944); Science Foundation Ireland (SFI/14/JPI_HDHL/B3075) en
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.publisher John Wiley & Sons, Inc. en
dc.rights © 2017, John Wiley & Sons, Ltd. This is the peer reviewed version of the following article: McParland, D., Phillips, C. M., Brennan, L., Roche, H. M. and Gormley, I. C. (2017) 'Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data', Statistics in Medicine, 36(28), pp. 4548-4569. doi:10.1002/sim.7371, which has been published in final form at https://doi.org/10.1002/sim.7371. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving. en
dc.subject Clustering en
dc.subject Mixed data en
dc.subject Phenotypic data en
dc.subject SNP data en
dc.subject Metabolic syndrome en
dc.title Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data en
dc.type Article (peer-reviewed) en
dc.internal.authorcontactother Catherine Phillips, Epidemiology & Public Health, University College Cork, Cork, Ireland. +353-21-490-3000 Email: c.phillips@ucc.ie en
dc.internal.availability Full text available en
dc.check.info Access to this article is restricted until 12 months after publication by request of the publisher. en
dc.check.date 2018-06-30
dc.date.updated 2018-05-25T08:57:19Z
dc.description.version Accepted Version en
dc.internal.rssid 421662010
dc.internal.wokid WOS:000415869400018
dc.contributor.funder Sixth Framework Programme en
dc.contributor.funder Science Foundation Ireland en
dc.description.status Peer reviewed en
dc.identifier.journaltitle Statistics in Medicine en
dc.internal.copyrightchecked Yes en
dc.internal.licenseacceptance Yes en
dc.internal.IRISemailaddress c.phillips@ucc.ie en
dc.relation.project info:eu-repo/grantAgreement/SFI/SFI Research Frontiers Programme (RFP)/09/RFP/MTH2367/IE/Model-based Statistical Methods for Mixed-Mode Metabolomic Data./ en
dc.relation.project info:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/ en
dc.relation.project info:eu-repo/grantAgreement/SFI/SFI Principal Investigator Programme (PI)/11/PI/1119/IE/Dietary fatty acids: impact on inflammasome driven adipose inflammation and insulin resistance _ novel therapeutic targets/ en


Files in this item

This item appears in the following Collection(s)

Show simple item record

This website uses cookies. By using this website, you consent to the use of cookies in accordance with the UCC Privacy and Cookies Statement. For more information about cookies and how you can disable them, visit our Privacy and Cookies statement