Classification of socially generated medical data

dc.availability.bitstreamembargoed
dc.check.chapterOfThesisChapter 9, is under review and not published yet.en
dc.check.date2023-12-31
dc.contributor.advisorSorensen, Humphreyen
dc.contributor.advisorO'Riordan, Adrianen
dc.contributor.authorAlnashwan, Rana
dc.contributor.funderPrincess Nourah Bint Abdulrahman Universityen
dc.date.accessioned2020-04-21T12:16:54Z
dc.date.available2020-04-21T12:16:54Z
dc.date.issued2019-09
dc.date.submitted2019-09
dc.description.abstractThe growth of online health communities, particularly those involving socially generated content, can provide considerable value for society. Participants can gain knowledge of medical information or interact with peers on medical forum platforms. However, the sheer volume of information so generated – and the consequent ‘noise’ associated with large data volumes – can create difficulties for information consumers. We propose a solution to this problem by applying high-level analytics to the data – primarily sentiment analysis, but also content and topic analysis - for accurate classification. We believe that such analysis can be of significant value to data users, such as identifying a particular aspect of an information space, determining themes that predominate among a large dataset, and allowing people to summarize topics within a big dataset. In this thesis, we apply machine learning strategies to identify sentiments expressed in online medical forums that discuss Lyme Disease. As part of this process, we distinguish a complete and relevant set of categories that can be used to characterize Lyme Disease discourse. We present a feature-based model that employs supervised learning algorithms and assess the feasibility and accuracy of this sentiment classification model. We further evaluate our model by assessing its ability to adapt to an online medical forum discussing a disease with similar characteristics, Lupus. The experimental results demonstrate the effectiveness of our approach. In many sentiment analysis applications, the labelled training datasets are expensive to obtain, whereas unlabelled datasets are readily available. Therefore, we present an adaptation of a well-known semi-supervised learning technique, in which co-training is implemented by combining labelled and unlabelled data. Our results would suggest the ability to learn even with limited labelled data. In addition, we investigate complementary analytic techniques – content and topic analysis – to leverage best used of the data for various consumer groups. Within the work described in this thesis, some particular research issues are addressed, specifically when applied to socially generated medical/health datasets: • When applying binary sentiment analysis to short-form text data (e.g. Twitter), could meta-level features improve performance of classification? • When applying more complex multi-class sentiment analysis to classification of long-form content-rich text data, would meta-level features be a useful addition to more conventional features? • Can this multi-class analysis approach be generalised to other medical/health domains? • How would alternative classification strategies benefit different groups of information consumers?en
dc.description.statusNot peer revieweden
dc.description.versionAccepted Versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.citationAlnashwan, R. 2019. Classification of socially generated medical data. PhD Thesis, University College Cork.en
dc.identifier.endpage162en
dc.identifier.urihttps://hdl.handle.net/10468/9842
dc.language.isoenen
dc.publisherUniversity College Corken
dc.rights© 2019, Rana Alnashwan.en
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectMulti-class sentiment classificationen
dc.subjectFeature extractionen
dc.subjectMachine learningen
dc.subjectContent analysisen
dc.subjectTopic analysisen
dc.subjectOnline health communityen
dc.titleClassification of socially generated medical dataen
dc.typeDoctoral thesisen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD - Doctor of Philosophyen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Classification-of-Socially-Generated-Medical-Data-Modified.pdf
Size:
2.49 MB
Format:
Adobe Portable Document Format
Description:
E-thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
5.2 KB
Format:
Item-specific license agreed upon to submission
Description: