Restriction lift date: 2023-12-31
Classification of socially generated medical data
Loading...
Date
2019-09
Authors
Alnashwan, Rana
Journal Title
Journal ISSN
Volume Title
Publisher
University College Cork
Published Version
Abstract
The growth of online health communities, particularly those involving socially
generated content, can provide considerable value for society. Participants can
gain knowledge of medical information or interact with peers on medical forum
platforms. However, the sheer volume of information so generated – and the
consequent ‘noise’ associated with large data volumes – can create difficulties
for information consumers. We propose a solution to this problem by applying
high-level analytics to the data – primarily sentiment analysis, but also content
and topic analysis - for accurate classification. We believe that such analysis can
be of significant value to data users, such as identifying a particular aspect of an
information space, determining themes that predominate among a large dataset,
and allowing people to summarize topics within a big dataset.
In this thesis, we apply machine learning strategies to identify sentiments expressed
in online medical forums that discuss Lyme Disease. As part of this
process, we distinguish a complete and relevant set of categories that can be used
to characterize Lyme Disease discourse. We present a feature-based model that
employs supervised learning algorithms and assess the feasibility and accuracy of
this sentiment classification model. We further evaluate our model by assessing
its ability to adapt to an online medical forum discussing a disease with similar
characteristics, Lupus. The experimental results demonstrate the effectiveness of
our approach.
In many sentiment analysis applications, the labelled training datasets are
expensive to obtain, whereas unlabelled datasets are readily available. Therefore,
we present an adaptation of a well-known semi-supervised learning technique,
in which co-training is implemented by combining labelled and unlabelled data.
Our results would suggest the ability to learn even with limited labelled data. In
addition, we investigate complementary analytic techniques – content and topic
analysis – to leverage best used of the data for various consumer groups.
Within the work described in this thesis, some particular research issues are addressed,
specifically when applied to socially generated medical/health datasets:
• When applying binary sentiment analysis to short-form text data (e.g.
Twitter), could meta-level features improve performance of classification?
• When applying more complex multi-class sentiment analysis to classification
of long-form content-rich text data, would meta-level features be a useful addition to more conventional features?
• Can this multi-class analysis approach be generalised to other medical/health
domains?
• How would alternative classification strategies benefit different groups of
information consumers?
Description
Keywords
Multi-class sentiment classification , Feature extraction , Machine learning , Content analysis , Topic analysis , Online health community
Citation
Alnashwan, R. 2019. Classification of socially generated medical data. PhD Thesis, University College Cork.