Application of mixed-effects modelling and supervised classification techniques to public health data

Thumbnail Image
Yang, Shuai
Journal Title
Journal ISSN
Volume Title
University College Cork
Published Version
Research Projects
Organizational Units
Journal Issue
This thesis consists of two parts. In PART A, we describe the application of mixed-effects modelling to 24 hour blood pressure. The blood pressure follows a 24-h circadian rhythm and the exaggerated morning surge in BP is an independent risk factor for cardiovascular diseases. In this project, the data analysed is from the Mitchelstown study. Morning SBP pattern between 4:00 am and 12:00 am was modelled using a piecewise linear mixed-effects model. Based on the likelihood function, the optimal breakpoint is at 7:30 am. Morning surge was characterised by the slope after the breakpoint. Model results revealed that the average slope between 7:30 am and 12:00 am is 2.47 mmHg/30 min (95\% CI: 2.35-2.59 mmHg/30 min). The Empirical Bayes estimates of subject-specific slopes were compared by age, gender, smoking, BMI, hypertension and diabetics. There were no significant differences in subject-specific morning surge between groups. Additionally, the relationship between chronic kidney disease (CKD) and the morning surge was explored using the multivariable logistic regression allowing for age, gender, smoking, BMI, hypertension and diabetics. Model results revealed that the association between the morning surge and CKD was not statistically significant. In PART B, supervised classification techniques are applied to SEYLE data. This project explores factors associated with drop-out in the SEYLE study. SEYLE study measured the mental health and wellbeing of adolescents with a baseline assessment and follow-up assessments at 3 and 12 months. Participant adherence is important when drawing inferences based on longitudinal data. However, drop-out in longitudinal studies are inevitable especially in adolescents. The primary objective of this project is to identify students with a high probability of drop-out in the SEYLE study using the Irish cohort. Multivariable logistic regression and decision trees (classification tree (CT), conditional inference tree, and evolutionary tree) were developed on a training data set. Factors considered included measures of sociodemographic, risk behaviours, lifestyle, general health, relationship and support, negative life events and psychiatric symptoms. Model performance was assessed on a test data set. Logistic regression analysis revealed that students aged 15/16, with chronic disease, normal anxiety level, high levels of hyperactivity, or lack of regular physical activity were significantly more likely to drop out of the SEYLE study. CT was regraded as the best tree and identified four subgroups based on age, anxiety and depression. Adolescents aged 15/16 without anxiety but with depression were classified as `drop-out' in this CT model. The choice between logistic regression and CT depends on the objective of the user. Logistic regression was the best at discriminating drop-out. However, CT is a simpler model and was marginally better at predicting drop-out.
Logistic regression , Classification tree (CT) , Conditional inference tree , Evolutionary tree , Mixed-effects modelling , Supervised classification techniques , Public health data
Yang, Y. 2019. Application of mixed-effects modelling and supervised classification techniques to public health data. MRes Thesis, University College Cork.
Link to publisher’s version