Classification of socially generated medical data

Alnashwan, Rana

Classification of socially generated medical data

dc.availability.bitstream	embargoed
dc.check.chapterOfThesis	Chapter 9, is under review and not published yet.	en
dc.check.date	2023-12-31
dc.contributor.advisor	Sorensen, Humphrey	en
dc.contributor.advisor	O'Riordan, Adrian	en
dc.contributor.author	Alnashwan, Rana
dc.contributor.funder	Princess Nourah Bint Abdulrahman University	en
dc.date.accessioned	2020-04-21T12:16:54Z
dc.date.available	2020-04-21T12:16:54Z
dc.date.issued	2019-09
dc.date.submitted	2019-09
dc.description.abstract	The growth of online health communities, particularly those involving socially generated content, can provide considerable value for society. Participants can gain knowledge of medical information or interact with peers on medical forum platforms. However, the sheer volume of information so generated – and the consequent ‘noise’ associated with large data volumes – can create difficulties for information consumers. We propose a solution to this problem by applying high-level analytics to the data – primarily sentiment analysis, but also content and topic analysis - for accurate classification. We believe that such analysis can be of significant value to data users, such as identifying a particular aspect of an information space, determining themes that predominate among a large dataset, and allowing people to summarize topics within a big dataset. In this thesis, we apply machine learning strategies to identify sentiments expressed in online medical forums that discuss Lyme Disease. As part of this process, we distinguish a complete and relevant set of categories that can be used to characterize Lyme Disease discourse. We present a feature-based model that employs supervised learning algorithms and assess the feasibility and accuracy of this sentiment classification model. We further evaluate our model by assessing its ability to adapt to an online medical forum discussing a disease with similar characteristics, Lupus. The experimental results demonstrate the effectiveness of our approach. In many sentiment analysis applications, the labelled training datasets are expensive to obtain, whereas unlabelled datasets are readily available. Therefore, we present an adaptation of a well-known semi-supervised learning technique, in which co-training is implemented by combining labelled and unlabelled data. Our results would suggest the ability to learn even with limited labelled data. In addition, we investigate complementary analytic techniques – content and topic analysis – to leverage best used of the data for various consumer groups. Within the work described in this thesis, some particular research issues are addressed, specifically when applied to socially generated medical/health datasets: • When applying binary sentiment analysis to short-form text data (e.g. Twitter), could meta-level features improve performance of classification? • When applying more complex multi-class sentiment analysis to classification of long-form content-rich text data, would meta-level features be a useful addition to more conventional features? • Can this multi-class analysis approach be generalised to other medical/health domains? • How would alternative classification strategies benefit different groups of information consumers?	en
dc.description.status	Not peer reviewed	en
dc.description.version	Accepted Version	en
dc.format.mimetype	application/pdf	en
dc.identifier.citation	Alnashwan, R. 2019. Classification of socially generated medical data. PhD Thesis, University College Cork.	en
dc.identifier.endpage	162	en
dc.identifier.uri	https://hdl.handle.net/10468/9842
dc.language.iso	en	en
dc.publisher	University College Cork	en
dc.rights	© 2019, Rana Alnashwan.	en
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/	en
dc.subject	Multi-class sentiment classification	en
dc.subject	Feature extraction	en
dc.subject	Machine learning	en
dc.subject	Content analysis	en
dc.subject	Topic analysis	en
dc.subject	Online health community	en
dc.title	Classification of socially generated medical data	en
dc.type	Doctoral thesis	en
dc.type.qualificationlevel	Doctoral	en
dc.type.qualificationname	PhD - Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Classification-of-Socially-Generated-Medical-Data-Modified.pdf
Size:: 2.49 MB
Format:: Adobe Portable Document Format
Description:: E-thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 5.2 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Research Theses
College of Science, Engineering and Food Science - Doctoral Theses
Computer Science - Doctoral Theses