Insight Centre for Data Analytics - Doctoral Theses

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 24
  • Item
    Intelligibility of music playlists
    (University College Cork, 2023) Gabbolini, Giovanni; Bridge, Derek G.; Science Foundation Ireland
    A common strategy for organising music is by arranging songs in a playlist to obtain a continuous and thematic music flow. Playlists are popular in music streaming services, where 58% of the listeners construct their own playlists. The flip side of popularity is content-overload; streaming services currently host billions of playlists. The commercial value of playlists has attracted notable research efforts during the last two decades. Much of the research on playlists is concerned with automatically constructing playlists. This dissertation is on playlists, but on a topic complementary to constructing playlists. Our concern here is on describing playlists, so that playlists can be understood by a human audience, i.e. so that they become intelligible. The way we achieve intelligibility is by developing algorithms that can generate textual annotations, both at playlist level and at song level. At playlist level, an annotation can be text (e.g. a tag or a caption) that describes the playlist as a whole; at song level, an annotation can be text that describes the transition between two consecutive songs in the playlist. The purpose of intelligibility is that of facilitating music organisation & access, as well as enhancing the listen- ing experience of users, two goals particularly relevant in a content overload scenario. We propose five algorithms for playlist-level intelligibility, and three algorithms for song-level intelligibility. We are particularly interested in the user experi- ence, so we test the algorithms, in most cases, with both offline experiments and user trials. We find evidence that the algorithms can help accomplish the two goals of intelligibility, i.e. enhancing listening experiences, and facilitating organisation and access. We pair the algorithms with a comprehensive survey of MIR research on music playlists, which provide a useful framework for understanding our contributions in the context of a broad selection of related research.
  • Item
    Metaheuristics and machine learning for joint stratification and sample allocation in survey design
    (University College Cork, 2022-01) O'Luing, Mervyn; Prestwich, Steve; Tarim, Armagan; European Regional Development Fund; Science Foundation Ireland
    In this thesis, we propose a number of metaheuristics and machine learning techniques to solve the joint stratification and sample allocation problem. Finding the optimal solution to this problem is hard when the sampling frame is large, and the evaluation algorithm is computationally burdensome. To advance the research in this area, we explore and evaluate different algorithmic methods of modelling and solving this problem. Firstly, we propose a new genetic algorithm approach using "grouping" genetic operators instead of traditional operators. Experiments show a significant improvement in solution quality for similar computational effort. Next, we combine the capability of a simulated annealing algorithm to escape from local minima with delta evaluation to exploit the similarity between consecutive solutions and thereby reduce evaluation time. Comparisons with two recent algorithms show the simulated annealing algorithm attaining comparable solution qualities in less computation time. Then, we consider the combination of the k-means and clustering algorithms with a hill climbing algorithm in stages and report the solution costs, evaluation times and training times. The multi-stage combinations generally compare well with recent algorithms, and provide the survey designer with a greater choice of algorithms to choose from. Finally, we combine the explorative properties of an estimation of distribution algorithm (EDA) to model the probabilities of an atomic stratum belonging to different strata with the exploitative search properties of a simulated annealing algorithm to create a hybrid estimation of distribution algorithm (HEDA). Results of comparisons with the best solution qualities from our earlier experiments show that the HEDA finds better solution qualities, but requires a longer total execution time than alternative approaches we considered.
  • Item
    Autonomous system control in unknown operating conditions
    (University College Cork, 2021-08-24) Sohège, Yves; Provan, Gregory; Tabirca, Marius-Sabin; Science Foundation Ireland
    Autonomous systems have become an interconnected part of everyday life with the recent increases in computational power available for both onboard computers and offline data processing. The race by car manufacturers for level 5 (full) autonomy in self-driving cars is well underway and new flying taxi service startups are emerging every week, attracting billions in investments. Two main research communities, Optimal Control and Reinforcement Learning stand out in the field of autonomous systems, each with a vastly different perspective on the control problem. Controllers from the optimal control community are based on models and can be rigorously analyzed to ensure the stability of the system is maintained under certain operating conditions. Learning-based control strategies are often referred to as model-free and typically involve training a neural network to generate the required control actions through direct interactions with the system. This greatly reduces the design effort required to control complex systems. One common problem both learning- and model- based control solutions face is the dependency on a priori knowledge about the system and operating conditions such as possible internal component failures and external environmental disturbances. It is not possible to consider every possible operating scenario an autonomous system can encounter in the real world at design time. Models and simulators are approximations of reality and can only be created for known operating conditions. Autonomous system control in unknown operating conditions, where no a priori knowledge exists, is still an open problem for both communities and no control methods currently exist for such situations. Multiple model adaptive control is a modular control framework that divides the control problem into supervisory and low-level control, which allows for the combination of existing learning- and model-based control methods to overcome the disadvantages of using only one of these. The contributions of this thesis consist of five novel supervisory control architectures, which have been empirically shown to improve a system’s robustness to unknown operating conditions, and a novel low- level controller tuning algorithm that can reduce the number of required controllers compared to traditional tuning approaches. The presented methods apply to any autonomous system that can be controlled using model-based controllers and can be integrated alongside existing fault-tolerant control systems to improve robustness to unknown operating conditions. This impacts autonomous system designers by providing novel control mechanisms to improve a system’s robustness to unknown operating conditions.
  • Item
    Improving human movement sensing with micro models and domain knowledge
    (2021-06-18) Scheurer, Sebastian; Brown, Kenneth; O'Sullivan, Barry; Science Foundation Ireland; European Regional Development Fund; Enterprise Ireland
    Human sensing is concerned with techniques for inferring information about humans from various sensing modalities. Examples of human sensing applications include human activity (or action) recognition, emotion recognition, tracking and localisation, identification, presence and motion detection, occupancy estimation, gesture recognition, and breath rate estimation. The first question addressed in this thesis is whether micro or macro models are a better design choice for human sensing systems. Micro models are models exclusively trained with data from a single entity, such as a Wi-Fi link, user, or other identifiable data-generating component. We consider micro and macro models in two human sensing applications, viz. Human Activity Recognition (HAR) from wearable inertial sensor data and device-free human presence detection from Wi-Fi signal data. The HAR literature is dominated by person-independent macro models. The few empirical studies that consider both micro and macro models evaluate them with either only one data-set or only one HAR algorithm, and report contradictory results. The device-free sensing literature is dominated by link-specific micro models, and the few papers that do use macro models do not evaluate their micro counterparts. Given the little and contradictory evidence, it remains an open question whether micro or macro models are a better design choice. We evaluate person-specific micro and person-independent macro models across seven HAR benchmark data-sets and four learning algorithms. We show that person-specific models (PSMs) significantly outperform the corresponding person-independent model (PIM) when evaluated with known users. To apply PSMs to data from new users, we propose ensembles of PSMs, which are improved by weighting their constituent PSMs according to their performance on other training users. We propose link-specific micro models to detect human presence from ambient Wi-Fi signal data. We select a link-specific model from the available training links, and show that this approach outperforms multi-link macro models. The second question addressed in this thesis is whether human sensing methods can be improved with domain knowledge. Specifically, we propose expert hierarchies (EHs) as an intuitive way to encode domain knowledge and simplify multi-class HAR, without negatively affecting predictive performance. The advantages of EHs are that they have lower time complexity than domain-agnostic methods and that their constituent classifiers are statistically independent. This property enables targeted tuning, and modular and iterative development of increasingly fine-grained HAR. Although this has inspired several uses of domain-specific hierarchical classification for HAR applications, these have been ad-hoc and without comparison to standard domain-agnostic methods. Therefore, it remains unclear whether they carry a penalty on predictive performance. We design five EHs and compare them to the best-known domain-agnostic methods. Our results show that EHs indeed can compete with more popular multi-class classification methods, both on the original multi-class problem and on the EHs' topmost levels.
  • Item
    Real-time algorithm configuration
    (University College Cork, 2021-04) Fitzgerald, Tadhg; O'Sullivan, Barry; Brown, Kenneth; Science Foundation Ireland; European Regional Development Fund
    This dissertation presents a number of contributions to the field of algorithm configur- ation. In particular, we present an extension to the algorithm configuration problem, real-time algorithm configuration, where configuration occurs online on a stream of instances, without the need for prior training, and problem solutions are returned in the shortest time possible. We propose a framework for solving the real-time algorithm configuration problem, ReACT. With ReACT we demonstrate that by using the parallel computing architectures, commonplace in many systems today, and a robust aggregate ranking system, configuration can occur without any impact on performance from the perspective of the user. This is achieved by means of a racing procedure. We show two concrete instantiations of the framework, and show them to be on a par with or even exceed the state-of-the-art in offline algorithm configuration using empirical evaluations on a range of combinatorial problems from the literature. We discuss, assess, and provide justification for each of the components used in our framework instantiations. Specifically, we show that the TrueSkill ranking system commonly used to rank players’ skill in multiplayer games can be used to accurately es- timate the quality of an algorithm’s configuration using only censored results from races between algorithm configurations. We confirm that the order that problem instances arrive in influences the configuration performance and that the optimal selection of configurations to participate in races is dependent on the distribution of the incoming in- stance stream. We outline how to maintain a pool of quality configurations by removing underperforming configurations, and techniques to generate replacement configurations with minimal computational overhead. Finally, we show that the configuration space can be reduced using feature selection techniques from the machine learning literature, and that doing so can provide a boost in configuration performance.