Computer Science - Doctoral Theses

Permanent URI for this collection

https://hdl.handle.net/10468/511

Browse

Now showing 1 - 20 of 82

A cryptographic approach to location privacy
(University College Cork, 2023) Eshun, Samuel N.; Palmieri, Paolo
The rapid expansion of location-based services (LBS) has driven an escalating demand for personalised and context-aware applications, enriching user experiences across health, weather, and navigation sectors. These services offer valuable insights into various applications using large-scale datasets of individual mobility traces. However, alongside the benefits come considerable privacy concerns, as the data could inadvertently reveal sensitive information about users' movements and behaviours. This thesis delves into multiple facets of privacy within LBS, concentrating on the de-anonymisation of mobility data and the subsequent privacy risks it poses. In response to these concerns, the thesis presents privacy-preserving indoor localisation techniques, later improved with an efficient cryptographic protocol to protect users and service providers against privacy breaches. The first part of this thesis focuses on de-anonymisation attacks on mobility data. We propose a novel de-anonymisation model that employs hidden Markov models (HMM) to create user mobility profiles based on spatiotemporal trajectories. The performance of this model is assessed using real-world mobility datasets from two different cities, Shanghai and Rome. Our attack techniques significantly improve over existing de-anonymisation techniques, successfully re-identifying up to 85% and 90% of anonymised users in the respective datasets. However, despite the model's effectiveness, limitations exist, such as the model's dependence on the availability of a training dataset. Future work could explore unsupervised machine learning algorithms to address these limitations or utilise more sophisticated techniques like recurrent neural networks (RNNs) to model the evolution of user mobility behaviour over time. The second part of the thesis addresses privacy concerns in indoor Wi-Fi localisation. We propose a privacy-preserving protocol that uses partial homomorphic encryption (Paillier's cryptosystem) to guarantee user location privacy while allowing computation in the encrypted domain. This approach ensures that most of the computational overhead on the user side is delegated to the server while hiding the user's exact location. By leveraging the Spatial Bloom filter {data structure}, complemented by homomorphic encryption, the service provider can learn about the user's presence in predefined areas without revealing the user's exact location or these predefined areas to the user. The third part of the thesis introduces an efficient and privacy-preserving cryptographic protocol that incorporates a more realistic security assumption in the form of a malicious adversary, one of the most important improvements to guarantee privacy for both the service provider and the user, unlike the semi-honest adversary in the previous protocol. Our protocol employs additive homomorphic encryption (DGK encryption) to preserve the privacy of the user's location fingerprint while allowing the service provider to compute over the encrypted fingerprint. In addition, garbled circuits protect the service provider's reference database against malicious users while delivering location output to the user. Finally, spatial Bloom filters further enhance the protocol by allowing the service provider to learn the user's vicinity in predefined areas of interest without revealing the exact location to the user or these predefined areas. Compared to similar protocols, our proposed solution demonstrates a significant reduction in computational costs on the user side and a 99.99% reduction in online communication costs, making it more efficient and practicable in the Internet of Things environments. Furthermore, our protocol is the first to provide security against malicious users, whereas other protocols are limited to honest-but-curious adversaries. For future work, we recommend strengthening the protection against actively corrupt service providers or cloud services by implementing additional cryptographic techniques, such as the ABY framework, for efficient mixed-protocol multiparty computation. Moreover, exploring other protocols or cryptographic primitives that improve efficiency, security, and privacy is encouraged, possibly through the combination of different techniques to optimise the protocol's current efficiency or reduce the size of the garbled circuit. By examining the challenges posed by de-anonymisation attacks and developing innovative solutions, this thesis offers a comprehensive approach to enhancing privacy and security in location-based services. By investigating privacy-preserving indoor localisation techniques and developing efficient protocols, we strive to protect the interests of both users and service providers. As the demand for personalised and context-aware applications continues to grow, this research contributes significantly to the ongoing conversation surrounding privacy and data protection in the digital age. The importance of addressing privacy concerns in location-based services cannot be overstated. As technological advancements progress and LBS permeate various aspects of our lives, ensuring the confidentiality of user data becomes paramount. The solutions presented in this thesis, including privacy-preserving indoor localisation techniques and efficient cryptographic protocols, are crucial steps towards achieving a balance between the benefits offered by these services and the privacy requirements of users and service providers. As location-based services evolve, new challenges and privacy risks will undoubtedly emerge. Therefore, the work presented in this thesis should be considered part of an ongoing effort to develop and refine techniques that preserve user privacy while maintaining the functionality and efficiency of LBS. The exploration of alternative cryptographic primitives, the improvement of existing protocols, and the development of new privacy-preserving methods will be essential to ensuring the continued growth and success of location-based services in a secure and privacy-conscious manner. In conclusion, this thesis comprehensively examines privacy concerns in LBS, focusing on de-anonymising mobility data and developing privacy-preserving indoor localisation techniques. The efficient cryptographic protocols proposed to offer robust protection for both users and service providers, paving the way for a more secure and privacy-oriented future for LBS.
Active learning in recommender systems: an unbiased and beyond-accuracy perspective
(University College Cork, 2020-12-05) Carraro, Diego; Bridge, Derek G.; O'Sullivan, Barry; Science Foundation Ireland; European Regional Development Fund
The items that a Recommender System (RS) suggests to its users are typically ones that it thinks the user will like and want to consume. An RS that is good at its job is of interest not only to its customers but also to service providers, so they can secure long-term customers and increase revenue. Thus, there is a challenge in building better recommender systems. One way to build a better RS is to improve the quality of the data on which the RS model is trained. An RS can use Active Learning (AL) to proactively acquire such data, with the goal of improving its model. The idea of AL for RS is to explicitly query the users, asking them to rate items which have not been rated yet. The items that a user will be asked to rate are known as the query items. Query items are different from recommendations. For example, the former may be items that the AL strategy predicts the user has already consumed, whereas the latter are ones that the RS predicts the user will like. In AL, query items are selected `intelligently' by an Active Learning strategy. Different AL strategies take different approaches to identify the query items. As with the evaluation of RSs, preliminary evaluation of AL strategies must be done offline. An offline evaluation can help to narrow the number of promising strategies that need to be evaluated in subsequent costly user trials and online experiments. Where the literature describes the offline evaluation of AL, the evaluation is typically quite narrow and incomplete: mostly, the focus is cold-start users; the impact of newly-acquired ratings on recommendation quality is usually measured only for those users who supplied those ratings; and impact is measured in terms of prediction accuracy or recommendation relevance. Furthermore, the traditional AL evaluation does not take into account the bias problem. As brought to light by recent RS literature, this is a problem that affects the offline evaluation of RS; it arises when a biased dataset is used to perform the evaluation. We argue that it is a problem that affects offline evaluation of AL strategies too. The main focus of this dissertation is on the design and evaluation of AL strategies for RSs. We first design novel methods (designated WTD and WTD_H) that `intervene' on a biased dataset to generate a new dataset with unbiased-like properties. Compared to the most similar approach proposed in the literature, we give empirical evidence, using two publicly-available datasets, that WTD and WTD_H are more effective at debiasing the evaluation of different recommender system models. We then propose a new framework for offline evaluation of AL for RS, which we believe facilitates a more authentic picture of the performances of the AL strategies under evaluation. In particular, our framework uses WTD or WTD_H to mitigate the bias, but it also assesses the impact of AL in a more comprehensive way than the traditional evaluation used in the literature. Our framework is more comprehensive in at least two ways. First, it segments users in more ways than is conventional and analyses the impact of AL on the different segments. Second, in the same way that RS evaluation has changed from a narrow focus on prediction accuracy and recommendation relevance to a wider consideration of so-called `beyond-accuracy' criteria (such as diversity, serendipity and novelty), our framework extends the evaluation of AL strategies to also cover `beyond-accuracy' criteria. Experimental results on two datasets show the effectiveness of our new framework. Finally, we propose some new AL strategies of our own. In particular, our new AL strategies, instead of focusing exclusively on prediction accuracy and recommendation relevance, are designed to also enhance `beyond-accuracy' criteria. We evaluate the new strategies using our more comprehensive evaluation framework.
Analysis and detection of security vulnerabilities in contemporary software
(University College Cork, 2017) Pieczul, Olgierd; Foley, Simon; International Business Machines Corporation
Contemporary application systems are implemented using an assortment of high-level programming languages, software frameworks, and third party components. While this may help to lower development time and cost, the result is a complex system of interoperating parts whose behavior is difficult to fully and properly comprehend. This difficulty of comprehension often manifests itself in the form of program coding errors that are not directly related to security requirements but can have an impact on the security of the system. The thesis of this dissertation is that many security vulnerabilities in contemporary software may be attributed to unintended behavior due to unexpected execution paths resulting from the accidental misuse of the software components. Unlike many typical programmer errors such as missed boundary checks or user input validation, these software bugs are not easy to detect and avoid. While typical secure coding best practices, such as code reviews, dynamic and static analysis, offer little protection against such vulnerabilities, we argue that runtime verification of software execution against a specified expected behavior can help to identify unexpected behavior in the software. The dissertation explores how building software systems using components may lead to the emergence of unexpected software behavior that results in security vulnerabilities. The thesis is supported by a study of the evolution of a popular software product over a period of twelve years. While anomaly detection techniques could be applied to verify software verification at runtime, there are several practical challenges in using them in large-scale contemporary software. A model of expected application execution paths and a methodology that can be used to build it during the software development cycle is proposed. The dissertation explores its effectiveness in detecting exploits on vulnerabilities enabled by software errors in a popular, enterprise software product.
An analysis of collaborative filtering datasets
(University College Cork, 2018) Griffith, Josephine; Sorensen, Humphrey; National University of Ireland, Galway
The work described in this thesis pertains to the area of Collaborative Filtering and focuses on collaborative filtering datasets and specially-defined portions of the datasets called views. The high level goal of the work is to better understand how different characteristics of datasets affects the performance of collaborative filtering techniques. Datasets, and views, are compared across a number of different experiments: some relating to techniques and accuracy and others relating to ideas of performance prediction.
An analytics-based decomposition approach to large-scale bilevel optimisation
(University College Cork, 2018) Fajemisin, Adejuyigbe; Prestwich, Steven David; Climent, Laura; Science Foundation Ireland
Bilevel optimisation problems contain several decision makers, each with different objectives and constraints, arranged in a hierarchical structure. One type of bilevel problem is the single-leader, multiple-follower problem, which has been used in applications like toll-setting, resource management, conflict resolution, and many others. This hierarchical structure allows for the reformulation of the forest harvesting problem as a multiple-follower bilevel problem. In the forest harvesting problem of Chapter 4, trees are cut into different log types, some of which are more valuable than others. Due to the fact that harvesting machines are designed to prioritise the production of these higher-value log types, over-production and waste of the high-value logs, as well as unfulfilled demand for the low-value logs is seen. Additionally, the discrepancy between amounts of log types expected pre-harvest and the actual amounts seen post-harvest leads to the inefficient harvesting of the forest. Despite the many approaches for solving multiple-follower problems, they are either not applicable in cases in which the follower problems are not traditional optimisation problems, or do not scale up appropriately. An example of this case occurs with the forest harvesting problem, where the follower problems are dynamic programming problems. Another example is the case where the follower problems are black-box functions. In such cases, replacing the follower problems with reformulations or optimality conditions are not applicable. Evolutionary algorithms can be used, but these are computationally-intensive schemes which do not scale up effectively. For this reason, an analytics-based approach, which is better able to sample the solution space is needed. The thesis defended throughout this dissertation is that an analytics-based decomposition approach can be used to solve largescale multiple-follower bilevel problems more efficiently than the other approaches. Specifically, the contributions of this thesis are: (i) a new class of multiple-follower bilevel problems is proposed; (ii) a novel analytics-based decomposition approach for solving this class of large-scale bilevel multiple-follower problems is given; (iii) the forest harvesting problem is reformulated as a bilevel optimisation problem to take into the account operation of harvester, and (iv) a reactive harvesting approach is developed to mitigate the effects of the uncertainty in the data .
An approach to robustness in stable marriage and stable roommates problems
(University College Cork, 2019) Genc, Begum; O'Sullivan, Barry; Siala, Mohamed; Science Foundation Ireland
This dissertation focuses on a novel concept of robustness within the context of matching problems. Our robustness notion for the stable matching framework is motivated by the unforeseen events that may occur after a matching is computed. We define the notion of (a,b)-supermatches as a measure of robustness of a matching. An (a,b)-supermatch characterizes a stable matching such that if any combination of 'a' pairs want to leave the matching, there exists an alternative matching in which those 'a' pairs are assigned new partners, and in order to obtain the new assignment at most 'b' other pairs are broken. We first formally define the notion of (a,b)-supermatches by using one of the most famous matching problems, namely the Stable Marriage problem (SM), as the platform. We name the problem of finding an (a,b)-supermatch to the SM as the Robust Stable Marriage problem (RSM). Subsequently, we prove that RSM is NP-hard, and the decision problem for the case where a=1 (i.e. deciding if there exists a (1,b)-supermatch) is NP-complete. We also develop a constraint programming model and a number of meta-heuristic approaches to find a (1,b)-supermatch that minimizes the value of 'b' for the RSM. Following the results on the RSM, we extend the notion of (a,b)-supermatches to the Stable Roommates problem (SR), namely, the Robust Stable Roommates problem (RSR). We show that the NP-hardness is also valid for the RSR, and we also define a polynomial-time procedure for the RSR to decide if a given stable matching is a (1,b)-supermatch. Similarly, we provide a number of meta-heuristic models to solve the optimization problem for finding a (1,b)-supermatch that minimizes the value of 'b'. We conclude this dissertation by providing some empirical results on the robustness of different datasets of RSM and RSR instances.
Asynchronous distributed clustering algorithms for wireless mesh network
(University College Cork, 2021-04-10) Qiao, Cheng; Brown, Kenneth; O'Sullivan, Barry; Science Foundation Ireland; European Regional Development Fund
Wireless Mesh Networks are becoming increasingly important in many applications. In many cases, data is acquired by devices that are distributed in space, but effective actions require a global view of that data across all devices. Transmitting all data to the centre allows strong data analytics algorithms to be applied, but consumes battery power for the nodes, and may cause data overload. To avoid this, distributed methods try to learn within the network, allowing each agent to learn a global picture and take appropriate actions. For distributed clustering in particular, existing methods share simple cluster descriptions until the network stabilises. The approaches tend to require either synchronous behaviour or many cycles, and omit important information about the clusters. In this thesis, we develop asynchronous methods that share richer cluster models, and we show that they are more effective in learning the global data patterns. Our underlying method describes the shape and density of each cluster, as well as its centroid and size. We combine cluster models by re-sampling from received models, and then re-clustering the new data sets. We then extend the approach, to allowing clustering methods that do not require the final number of clusters as input. After that, we consider the cases that there might be sub-groups of agents that are receiving different patterns of data. Finally, we extend our approaches to scenarios where each agent has no idea about whether there is a single pattern or are multiple patterns. We demonstrate that the approaches can regenerate clusters that are similar to the distributions that were used to create the test data. When the number of clusters are not available, the learned number of clusters is close to the ground truth. The proposed algorithms can learn how data points are clustered even when there are multiple patterns in the network. When the number of patterns (single or multiple) is not known in advance, the proposed methods Optimised KDE and DBSCAN preform well in detecting multiple patterns. Although they perform worse in detecting the single pattern, they can still learn how data points are clustered.
Automatic detection of learner-style for adaptive eLearning
(University College Cork, 2013) Mehigan, Tracey J.; Pitt, Ian; Connolly, Tracey; Irish Research Council for Science Engineering and Technology; Digital Hub, Ireland
The advent of modern wireless technologies has seen a shift in focus towards the design and development of educational systems for deployment through mobile devices. The use of mobile phones, tablets and Personal Digital Assistants (PDAs) is steadily growing across the educational sector as a whole. Mobile learning (mLearning) systems developed for deployment on such devices hold great significance for the future of education. However, mLearning systems must be built around the particular learner’s needs based on both their motivation to learn and subsequent learning outcomes. This thesis investigates how biometric technologies, in particular accelerometer and eye-tracking technologies, could effectively be employed within the development of mobile learning systems to facilitate the needs of individual learners. The creation of personalised learning environments must enable the achievement of improved learning outcomes for users, particularly at an individual level. Therefore consideration is given to individual learning-style differences within the electronic learning (eLearning) space. The overall area of eLearning is considered and areas such as biometric technology and educational psychology are explored for the development of personalised educational systems. This thesis explains the basis of the author’s hypotheses and presents the results of several studies carried out throughout the PhD research period. These results show that both accelerometer and eye-tracking technologies can be employed as an Human Computer Interaction (HCI) method in the detection of student learning-styles to facilitate the provision of automatically adapted eLearning spaces. Finally the author provides recommendations for developers in the creation of adaptive mobile learning systems through the employment of biometric technology as a user interaction tool within mLearning applications. Further research paths are identified and a roadmap for future of research in this area is defined.
Autonomous system control in unknown operating conditions
(University College Cork, 2021-08-24) Sohège, Yves; Provan, Gregory; Tabirca, Marius-Sabin; Science Foundation Ireland
Autonomous systems have become an interconnected part of everyday life with the recent increases in computational power available for both onboard computers and offline data processing. The race by car manufacturers for level 5 (full) autonomy in self-driving cars is well underway and new flying taxi service startups are emerging every week, attracting billions in investments. Two main research communities, Optimal Control and Reinforcement Learning stand out in the field of autonomous systems, each with a vastly different perspective on the control problem. Controllers from the optimal control community are based on models and can be rigorously analyzed to ensure the stability of the system is maintained under certain operating conditions. Learning-based control strategies are often referred to as model-free and typically involve training a neural network to generate the required control actions through direct interactions with the system. This greatly reduces the design effort required to control complex systems. One common problem both learning- and model- based control solutions face is the dependency on a priori knowledge about the system and operating conditions such as possible internal component failures and external environmental disturbances. It is not possible to consider every possible operating scenario an autonomous system can encounter in the real world at design time. Models and simulators are approximations of reality and can only be created for known operating conditions. Autonomous system control in unknown operating conditions, where no a priori knowledge exists, is still an open problem for both communities and no control methods currently exist for such situations. Multiple model adaptive control is a modular control framework that divides the control problem into supervisory and low-level control, which allows for the combination of existing learning- and model-based control methods to overcome the disadvantages of using only one of these. The contributions of this thesis consist of five novel supervisory control architectures, which have been empirically shown to improve a system’s robustness to unknown operating conditions, and a novel low- level controller tuning algorithm that can reduce the number of required controllers compared to traditional tuning approaches. The presented methods apply to any autonomous system that can be controlled using model-based controllers and can be integrated alongside existing fault-tolerant control systems to improve robustness to unknown operating conditions. This impacts autonomous system designers by providing novel control mechanisms to improve a system’s robustness to unknown operating conditions.
Baran: a service-oriented cloud-based user monitoring and data analysis framework
(University College Cork, 2019) Hashemi, Mohammad; Herbert, John; Sreenan, Cormac J.; Higher Education Authority; European Regional Development Fund
Most humans now live in an age of connected digital devices where their interactions with these devices are recorded in a number of ways, by different organisations, under various levels of user control. A user is often unaware of how much data they create, of where the data resides, and of how their data is used. It is challenging for a user to inspect all this personal data, to control the storage and use of the data, and to exploit the data for their benefit in a safe way. These are the challenges that this thesis addresses. The focus is on the user’s digital trail of information – i.e. the record of user interactions and associated context information from the digital devices associated with the user. The digital devices associated with a user, and with which they interact, might include smartphones, laptops, smart watches, fitness bands, and smart household appliances. The thesis develops a conceptual model and extensible data structure, called the UDI (User Digital Imprint), which accommodates a variety of digital data from digital devices along with information derived from that data. A software framework, Baran, is developed that implements the UDI and provides services that support the gathering, management and analysis of the user data. The Baran framework allows the user to inspect and analyse their data and supports (under user control) the sharing of the data in order that assistive services for the user can be provided by 3rd parties. The framework is cloud-based and service-oriented, enabling the framework to make use of external services (e.g. machine learning services) and to provide services for external entities (e.g. supplying some subset of user data to a smart coffee maker). Thus, Baran can be extended with both user services and external services, where the sharing of user data with external entities is directly under user control on a per-use basis. By gathering, analysing and making available comprehensive information about a user’s digital interactions, Baran also enables study of aspects of User Experience (UX), as might be of interest to UX researchers or product designers. Three case studies are presented to demonstrate aspects of Baran and how it can be used in different scenarios. Rather than a limited-scope, specific solution (such as recent functionality introduced by Google and Apple) Baran provides a general extensible framework that covers a range of user digital interactions on a variety of devices, and enables a wide range of applications, where all data sharing is under explicit user control.
Bayesian bilevel optimization
(University College Cork, 2023) Dogan, Vedat; Prestwich, Steve; Wilson, Nic; Science Foundation Ireland
In this dissertation, we focus on improving bilevel optimization through several approaches developed during research. Bilevel optimization problems consist of upper-level and lower-level optimization problems connected hierarchically. Upper-level and lower-level problems are also referred to as the leader and the follower problems in the literature. The leader must solve a constrained optimization problem in which some decisions are made by the follower. Both have their objectives and constraints. The follower’s problem appears as a constraint of the leader. Such problems are used to model practical applications in real life where authority is realizable only if the corresponding follower objective is optimum. There are several practical applications in the fields of engineering, environmental economics, defence industry, transportation and logistics that have nested structures that are fitted to these types of modelling. As the complexity and the size of such problems have increased over the years, the design of efficient algorithms for bilevel optimization problems has become a critical issue and an important research topic. The nested structure of bilevel optimization problems comes with several interesting challenges to the problem of algorithm design. Naturally, the problem requires an optimization problem at the lower level for each upper-level solution. That makes the problem computationally expensive. Also, an upper-level solution is considered feasible only if the corresponding lower-level solutions are the global optimum of its objective. Therefore, not accurate lower-level solutions might lead to an objective value that is better than the true optimum at the upper level. It comes with another challenge for the selection strategies of solutions and naturally the performance including the computational effort. There are several approaches proposed in the literature for solving bilevel problems. Most focus on special cases, for example, a large set of exact methods was introduced to solve small linear bilevel optimization problems. Another approach is to replace the lower-level problem with its Karush-Kuhn-Tucker conditions, reducing the bilevel problem to a single-level optimization problem. A popular approach is to use nested evolutionary search to explore both parts of the problem, but this is computationally expensive and does not scale well. The works presented in this dissertation are directed to improve the bilevel optimization process in terms of accuracy and required function evaluations by developing and implementing the black-box approach to the upper-level problem. First, we focused on extracting knowledge of previous iterations and conditioned lower-level decisions to improve upper-level search. Then, we attempt to improve upper-level search by using the multi-objective acquisition function in the Bayesian optimization process and use the benefit of multi-objective optimization literature for using different search strategies at the upper-level search. After, the multi-objective bilevel optimization problems are investigated and a Bayesian approach is developed to approximate the Pareto-optimal front of the problem. Both single and multi-objective optimization problems are investigated in this dissertation. The experiments are conducted on a comprehensive suite of mathematical test problems that are available in the literature as well as some real-world problems. The performance is compared with existing methods. It is observed that the proposed approaches achieve a promising balance between accuracy and computational expanse, therefore it is suitable for applying the proposed approaches to several real-life applications in different fields.
Beyond gangsta: hip-hop, web culture and racial masking in the musical work of Tyler, The Creator
(University College Cork, 2021-09-14) Marques, Gustavo Souva; Rollefson, J. Griffith; Kulezic-Wilson, Danijela; Alves da Silva, Rubens
Tyler, The Creator (Tyler Gregory Okonma), is an African American rapper, music producer and entrepreneur who has been vigorously challenging tropes of black American masculinity. From chattel slavery to blackface minstrelsy, the African diasporic experience in the West is marked by a series of stigmas, contradictions and dichotomies evidenced in the challenge of being black in a white world. This duplicity denounced and analyzed by scholars such as W.E.B. DuBois and Frantz Fanon informs the theoretical frame of this dissertation but also reflects Tyler’s investments in subverting American racial ideology in his controversial audio-visual performances. Through his participation in and appropriating of skateboarding and web culture, Tyler denies common associations and stereotypes of blackness related to gangsterism which allowed him to become an internet phenomenon at the age of 19 with his music video “Yonkers” (2011). This study analyzes Tyler’s systematic move beyond gangsta through close media analysis and through ethnographic work in and around the black suburbia of Tyler’s upbringing in Southern California.
Chain-based recommendations
(2019-10-14) Rana, Arpit; Bridge, Derek G.; Provan, Gregory; Science Foundation Ireland
Recommender systems are discovery tools. Typically, they infer a user's preferences from her behaviour and make personalized suggestions. They are one response to the overwhelming choices that the Web affords its users. Recent studies have shown that a user of a recommender system is more likely to be satisfied by the recommendations if the system provides explanations that allow the user to understand their rationale, and if the system allows the user to provide feedback on the recommendations to improve the next round of recommendations so that they take account of the user's ephemeral needs. The goal of this dissertation is to introduce a new recommendation framework that offers a better user experience, while giving quality recommendations. It works on content-based principles and addresses both the issues identified in the previous paragraph, i.e.\ explanations and recommendation feedback. We instantiate our framework to produce two recommendation engines, each focusing on one of the themes: (i) the role of explanations in producing recommendations, and (ii) helping users to articulate their ephemeral needs. For the first theme, we show how to unify recommendation and explanation to a greater degree than has been achieved hitherto. This results in an approach that enables the system to find relevant recommendations with explanations that have a high degree of both fidelity and interpretability. For the second theme, we show how to allow users to steer the recommendation process using a conversational recommender system. Our approach allows the user to reveal her short-term preferences and have them taken into account by the system and thus assists her in making a good decision efficiently. Early work on conversational recommender systems considers the case where the candidate items have structured descriptions (e.g.\ sets of attribute-value pairs). Our new approach works in the case where items have unstructured descriptions (e.g.\ sets of genres or tags). For each of the two themes, we describe the problem settings, the state-of-the-art, our system design and our experiment design. We evaluate each system using both offline analyses as well as user trials in a movie recommendation domain. We find that the proposed systems provide relevant recommendations that also have a high degree of serendipity, low popularity-bias and high diversity.
Classification of socially generated medical data
(University College Cork, 2019-09) Alnashwan, Rana; Sorensen, Humphrey; O'Riordan, Adrian; Princess Nourah Bint Abdulrahman University
The growth of online health communities, particularly those involving socially generated content, can provide considerable value for society. Participants can gain knowledge of medical information or interact with peers on medical forum platforms. However, the sheer volume of information so generated – and the consequent ‘noise’ associated with large data volumes – can create difficulties for information consumers. We propose a solution to this problem by applying high-level analytics to the data – primarily sentiment analysis, but also content and topic analysis - for accurate classification. We believe that such analysis can be of significant value to data users, such as identifying a particular aspect of an information space, determining themes that predominate among a large dataset, and allowing people to summarize topics within a big dataset. In this thesis, we apply machine learning strategies to identify sentiments expressed in online medical forums that discuss Lyme Disease. As part of this process, we distinguish a complete and relevant set of categories that can be used to characterize Lyme Disease discourse. We present a feature-based model that employs supervised learning algorithms and assess the feasibility and accuracy of this sentiment classification model. We further evaluate our model by assessing its ability to adapt to an online medical forum discussing a disease with similar characteristics, Lupus. The experimental results demonstrate the effectiveness of our approach. In many sentiment analysis applications, the labelled training datasets are expensive to obtain, whereas unlabelled datasets are readily available. Therefore, we present an adaptation of a well-known semi-supervised learning technique, in which co-training is implemented by combining labelled and unlabelled data. Our results would suggest the ability to learn even with limited labelled data. In addition, we investigate complementary analytic techniques – content and topic analysis – to leverage best used of the data for various consumer groups. Within the work described in this thesis, some particular research issues are addressed, specifically when applied to socially generated medical/health datasets: • When applying binary sentiment analysis to short-form text data (e.g. Twitter), could meta-level features improve performance of classification? • When applying more complex multi-class sentiment analysis to classification of long-form content-rich text data, would meta-level features be a useful addition to more conventional features? • Can this multi-class analysis approach be generalised to other medical/health domains? • How would alternative classification strategies benefit different groups of information consumers?
Cloud-based machine learning architecture for big data analysis
(University College Cork, 2019) Pakdel, Rezvan; Herbert, John; Irish Research Council for Science, Engineering and Technology
The use of machine-learning that leverages large amounts of data (big data) is increasingly important in many areas of business and research. To help cope with the demanding resources required by these applications, solutions including hardware platforms (e.g. graphics cards), more efficient algorithms (e.g. deep learning algorithms), and special software environments (e.g. tensor flow) have been developed. In addition, for specific applications, special optimisations are often developed based on the requirements of the particular application. This thesis also addresses the challenge of efficiency of machine learning over big data but does so in a way that is complementary to specialised hardware and algorithms, and in a way that is also independent of application and data type. The thesis has developed several types of general optimisations and implemented these on top of an underlying generic machine learning architecture. The generic machine learning architecture includes stages for segmentation, feature extraction, model building and classification. The optimisation components enhance this architecture in a general way that works with any datatype and any dataset, and where the optimisation responds to the needs of the particular application, and is self-adjusting for the particular dataset being processed. The optimisations developed are: model optimisation; feature optimisation; resources optimisation; cloud platform cost-benefit optimisation. Model optimisation involves evaluating multiple models in parallel, and using feedback on model performance to choose the best ones based on the dataset being processed. Feature optimisation involves evaluating various features and combinations of features, and then choosing those features that are most effective for classification. Resources optimisation involves dynamically adjusting compute instances to respond to the demands of an application. Cloud platform cost-benefit optimisation involves evaluating the cost of available public cloud compute instances, and determining appropriate cost-efficient instances depending on the needs of an application. General techniques of sampling, evaluation and feedback are used in several optimisation components. The underlying framework and optimisations have been implemented and deployed in a private cloud environment. Evaluation on various datasets ( image and text datasets) has shown these optimisation components to be effective, and provide useful generic components that can work in conjunction with other optimisations to address the challenging demands of machine learning over big data.
Cognitive radio adaptive rendezvous protocols to establish network services for a disaster response
(University College Cork, 2018) Ghafoor, Saim; Brown, Kenneth N.; Sreenan, Cormac J.; Science Foundation Ireland
Disasters are catastrophic events that cause great damage or loss of life. In disasters, communication services might be disrupted due to damage to the existing network infrastructure. Temporary systems are required for victims and first responders, but installing them requires information about the radio environment and available spectrum. A cognitive radio (CR) can be used to provide a flexible and rapidly deployable temporary system due to its sensing, learning and decision-making capabilities. This thesis initially examines the potential of CR technology for disaster response networks (DRN) and shows that they are ideally suited to fulfill the requirements of a DRN. A software defined radio based prototype for multiple base transceiver stations based cellular network is proposed and developed. It is demonstrated that system can support a large number of simultaneous calls with sufficient call quality, but only when the background interference is low. It is concluded that to provide call quality with acceptable latency and packet losses, the spectrum should be used dynamically for backhaul connectivity. The deployment challenges for such a system in a disaster include the discovery of the available spectrum, existing networks, and neighbours. Furthermore, to set up a network and to establish network services, initially CR nodes are required to establish a rendezvous. However, this can be challenging due to unknown spectrum information, primary radio (PR) activity, nodes, and topology. The existing rendezvous strategies do not fulfill the DRN requirements and their time to rendezvous (TTR) is long. Therefore, we propose an extended modular clock algorithm (EMCA) which is a multiuser blind rendezvous protocol, considers the DRN requirements and has short TTR. For unknown nodes and topologies, a general framework for self-organizing multihop cooperative fully blind rendezvous protocol is also proposed, which works in different phases, can terminate when sufficient nodes are discovered, and is capable of disseminating the information of nodes which enter or leave a network. A synchronization mechanism is presented for periodic update of rendezvous information. An information exchange mechanism is also proposed which expedites the rendezvous process. In both single and multihop networks, EMCA provides up to 80% improvement in terms of TTR over the existing blind rendezvous strategies while considering the PR activity. A simple Random strategy, while being poorer than EMCA, is also shown to outperform existing strategies on average. To achieve adaptability in the presence of unknown PR activity, different CR operating policies are proposed which avoid the channels detected with PR activity to reduce the harmful interference, provide free channels to reduce the TTR, and can work with any rendezvous strategy. These policies are evaluated over different PR activities and shown to reduce the TTR and harmful interference significantly over the basic Listen before Talk approach. A proactive policy, which prefers to return to channels with recent lower PR activity, is shown to be best, and to improve the performance of all studied rendezvous strategies.
Combinatorial optimisation for sustainable cloud computing
(University College Cork, 2018) De Cauwer, Milan; O'Sullivan, Barry; Mehta, Deepak; Science Foundation Ireland
Enabled by both software and hardware advances, cloud computing has emerged as an efficient way to leverage economies of scale for building large computational infrastructures over a global network. While the cost of computation has dropped significantly for end users, the infrastructure supporting cloud computing systems has considerable economic and ecological costs. A key challenge for sustainable cloud computing systems in the near future is to maintain control over these costs. Amid the complexity of cloud computing systems, a cost analysis reveals a complex relationship between the infrastructure supporting actual computation on a physical level and how these physical assets are utilised. The central question tackled in this dissertation is how to best utilise these assets through efficient workload management policies. In recent years, workload consolidation has emerged as an effective approach to increase the efficiency of cloud systems. We propose to address aspects of this challenge by leveraging techniques from the realm of mathematical modeling and combinatorial optimisation. We introduce a novel combinatorial optimisation problem suitable for modeling core consolidation problems arising in workload management in data centres. This problem extends on the well-known bin packing problem. We develop competing models and optimisation techniques to solve this offline packing problem with state-of-the-art solvers. We then cast this newly defined combinatorial optimisation problem in an semi-online setting for which we propose an efficient assignment policy that is able to produce solutions for the semi-online problem in a competitive computational time. Stochastic aspects, which are often faced by cloud providers, are introduced in a richer model. We then show how predictive methods can help decision makers dealing with uncertainty in such dynamic and heterogeneous systems. We explore a similar but relaxed problem falling within the scope of proactive consolidation. This is a relaxed consolidation problem in which one decides which, when and where workload should be migrated to retain minimum energy cost. Finally, we discuss ongoing efforts to model and characterise the combinatorial hardness of bin packing instances, which in turn will be useful to study the various packing problems found in cloud computing environments.
Combining and choosing case base maintenance algorithms
(University College Cork, 2013) Cummins, Lisa; Bridge, Derek G.; Science Foundation Ireland
Case-Based Reasoning (CBR) uses past experiences to solve new problems. The quality of the past experiences, which are stored as cases in a case base, is a big factor in the performance of a CBR system. The system's competence may be improved by adding problems to the case base after they have been solved and their solutions verified to be correct. However, from time to time, the case base may have to be refined to reduce redundancy and to get rid of any noisy cases that may have been introduced. Many case base maintenance algorithms have been developed to delete noisy and redundant cases. However, different algorithms work well in different situations and it may be difficult for a knowledge engineer to know which one is the best to use for a particular case base. In this thesis, we investigate ways to combine algorithms to produce better deletion decisions than the decisions made by individual algorithms, and ways to choose which algorithm is best for a given case base at a given time. We analyse five of the most commonly-used maintenance algorithms in detail and show how the different algorithms perform better on different datasets. This motivates us to develop a new approach: maintenance by a committee of experts (MACE). MACE allows us to combine maintenance algorithms to produce a composite algorithm which exploits the merits of each of the algorithms that it contains. By combining different algorithms in different ways we can also define algorithms that have different trade-offs between accuracy and deletion. While MACE allows us to define an infinite number of new composite algorithms, we still face the problem of choosing which algorithm to use. To make this choice, we need to be able to identify properties of a case base that are predictive of which maintenance algorithm is best. We examine a number of measures of dataset complexity for this purpose. These provide a numerical way to describe a case base at a given time. We use the numerical description to develop a meta-case-based classification system. This system uses previous experience about which maintenance algorithm was best to use for other case bases to predict which algorithm to use for a new case base. Finally, we give the knowledge engineer more control over the deletion process by creating incremental versions of the maintenance algorithms. These incremental algorithms suggest one case at a time for deletion rather than a group of cases, which allows the knowledge engineer to decide whether or not each case in turn should be deleted or kept. We also develop incremental versions of the complexity measures, allowing us to create an incremental version of our meta-case-based classification system. Since the case base changes after each deletion, the best algorithm to use may also change. The incremental system allows us to choose which algorithm is the best to use at each point in the deletion process.
Computing explanations for interactive constraint-based systems
(University College Cork, 2011-12) Papadopoulos, Alexandre; O'Sullivan, Barry; Science Foundation Ireland
Constraint programming has emerged as a successful paradigm for modelling combinatorial problems arising from practical situations. In many of those situations, we are not provided with an immutable set of constraints. Instead, a user will modify his requirements, in an interactive fashion, until he is satisfied with a solution. Examples of such applications include, amongst others, model-based diagnosis, expert systems, product configurators. The system he interacts with must be able to assist him by showing the consequences of his requirements. Explanations are the ideal tool for providing this assistance. However, existing notions of explanations fail to provide sufficient information. We define new forms of explanations that aim to be more informative. Even if explanation generation is a very hard task, in the applications we consider, we must manage to provide a satisfactory level of interactivity and, therefore, we cannot afford long computational times. We introduce the concept of representative sets of relaxations, a compact set of relaxations that shows the user at least one way to satisfy each of his requirements and at least one way to relax them, and present an algorithm that efficiently computes such sets. We introduce the concept of most soluble relaxations, maximising the number of products they allow. We present algorithms to compute such relaxations in times compatible with interactivity, achieving this by indifferently making use of different types of compiled representations. We propose to generalise the concept of prime implicates to constraint problems with the concept of domain consequences, and suggest to generate them as a compilation strategy. This sets a new approach in compilation, and allows to address explanation-related queries in an efficient way. We define ordered automata to compactly represent large sets of domain consequences, in an orthogonal way from existing compilation techniques that represent large sets of solutions.
Computing policy parameters for stochastic inventory control using stochastic dynamic programming approaches
(University College Cork, 2020-08-29) Visentin, Andrea; Prestwich, Steven David; Brown, Kenneth N.; Science Foundation Ireland
The objective of this work is to introduce techniques for the computation of optimal and near-optimal inventory control policy parameters for the stochastic inventory control problem under Scarf’s setting. A common aspect of the solutions presented herein is the usage of stochastic dynamic programming approaches, a mathematical programming technique introduced by Bellman. Stochastic dynamic programming is hybridised with branch-and-bound, binary search, constraint programming and other computational techniques to develop innovative and competitive solutions. In this work, the classic single-item, single location-inventory control with penalty cost under the independent stochastic demand is extended to model a fixed review cost. This cost is charged when the inventory level is assessed at the beginning of a period. This operation is costly in practice and including it can lead to significant savings. This makes it possible to model an order cancellation penalty charge. The first contribution hereby presented is the first stochastic dynamic program- ming that captures Bookbinder and Tan’s static-dynamic uncertainty control policy with penalty cost. Numerous techniques are available in the literature to compute such parameters; however, they all make assumptions on the de- mand probability distribution. This technique has many similarities to Scarf’s stochastic dynamic programming formulation, and it does not require any ex- ternal solver to be deployed. Memoisation and binary search techniques are deployed to improve computational performances. Extensive computational studies show that this new model has a tighter optimality gap compared to the state of the art. The second contribution is to introduce the first procedure to compute cost- optimal parameters for the well-known (R, s, S) policy. Practitioners widely use such a policy; however, the determination of its parameters is considered com- putationally prohibitive. A technique that hybridises stochastic dynamic pro- gramming and branch-and-bound is presented, alongside with computational enhancements. Computing the optimal policy allows the determination of op- timality gaps for future heuristics. This approach can solve instances of consid- erable size, making it usable by practitioners. The computational study shows the reduction of the cost that such a system can provide. Thirdly, this work presents the first heuristics for determining the near-optimal parameters for the (R,s,S) policy. The first is an algorithm that formally models the (R,s,S) policy computation in the form of a functional equation. The second is a heuristic formed by a hybridisation of (R,S) and (s,S) policy parameters solvers. These heuristics can compute near-optimal parameters in a fraction of time compared to the exact methods. They can be used to speed up the optimal branch-and-bound technique. The last contribution is the introduction of a technique to encode dynamic programming in constraint programming. Constraint programming provides the user with an expressive modelling language and delegates the search for the solution to a specific solver. The possibility to seamlessly encode dynamic programming provides new modelling options, e.g. the computation of optimal (R,s,S) policy parameters. The performances in this specific application are not competitive with the other techniques proposed herein; however, this encoding opens up new connections between constraint programming and dynamic programming. The encoding allows deploying DP based constraints in modelling languages such as MiniZinc. The computational study shows how this technique can outperform a similar encoding for mixed-integer programming.

Browse

Browsing Computer Science - Doctoral Theses by Title

Results Per Page

Sort Options