Cluster sampling in large oral health surveys – issues and implications for design and analysis

Thumbnail Image
SheerinAJ_PhD2019.pdf(2.12 MB)
Full Text E-thesis
Sheerin, Anthony Joseph
Journal Title
Journal ISSN
Volume Title
University College Cork
Published Version
Research Projects
Organizational Units
Journal Issue
The use of survey sampling has become routine in almost all aspects of life. If the chosen sample is representative of the population, inferences about the population can be made from the sample. Chapter 1 reviews commonly used survey sampling techniques and discusses the variance estimation methods required for these techniques. Particular attention will be given to cluster sampling, as the data underlying this thesis was gathered in such a manner. The results of the literature review in Chapter 1 identify the direction of analysis for Chapter 2. Chapter 2 compares the readily available cluster variance estimation methods, namely Taylor Series Linearisation (TSL) and the delete-one jackknife (JK1), on simulated finite populations of data with known characteristics, to ascertain if there are any situations where the estimates differ. Multiple situations are examined systematically, including skewed distributions, large sampling fractions and small sample sizes with these situations occurring both in isolation and simultaneously. These methods provided identical estimates when the sampling fraction was small, and the number of observations, particularly at second-stage, was large. However, when these conditions were violated, diverging estimates occurred. Moreover, design effects less than one are seen in this chapter which is unusual in a cluster sampling setting. Chapter 3 looks at using the above variance estimation techniques on a national oral health dataset. Chapter 3 analyses a national oral health dataset, in which there are regions with design effects less than one. As the data was collected using cluster sampling, the presence of design effects less than one is extremely unusual and warranted an analysis to try and identify the possible causes. Both cluster analysis and linear model methods are used to identify variables which may indicate the presence of a design effect less than one, with the number of clusters sampled (n) being the most identified variable using different models. Chapter 3 suggests that reducing the number of clusters sampled will reduce the design effect of the data. Chapter 4 looks at the effect of reducing the number of clusters sampled (n) on the SE and DE estimates of the survey data analysed in Chapter 3. Variance estimates are produced using a jackknife resampling approach, looking at all combinations when n = 1,2,…,10 clusters are dropped in turn. The results show that a small number of clusters per community care area (CCA) can be dropped, with no impact on the bias of the estimate and a very small increase in the SE of the estimate.
Variance estimation , Cluster sampling
Sheerin, A. J. 2019. Cluster sampling in large oral health surveys – issues and implications for design and analysis. PhD Thesis, University College Cork.