Techniques for improving the efficiency and security of Data Confidence Fabrics

Loading...
Thumbnail Image
Files
Date
2025
Authors
Khalid, Asfa
Journal Title
Journal ISSN
Volume Title
Publisher
University College Cork
Published Version
Research Projects
Organizational Units
Journal Issue
Abstract
This thesis presents novel techniques to improve the efficiency, scalability, and security of Data Confidence Fabrics (DCFs), a framework that ensures data authenticity and integrity in large scale, heterogeneous distributed systems by generating metadata at each point of data formation, processing, and transmission. Despite their strengths, DCFs face significant challenges, including excessive annotation and transactional overhead, which reduce scalability and efficiency, and metadata privacy risks, which compromise sensitive network information. To address these challenges, this research proposes methods that improve system scalability, enable efficient annotation retrieval, and protect sensitive network information, with a focus on the Alvarium Data Confidence Fabric, though the solutions are broadly applicable to other DCFs. A primary contribution of this work is addressing the efficiency and scalability challenges by reducing annotation overhead through compact annotation techniques, particularly annotation batching. By aggregating multiple annotations into a single ledger transaction, this approach minimizes redundancy, storage costs and ledger interactions. However, batching introduces complexity in retrieving individual annotations. To overcome this, two retrieval methods are proposed: Batch Keys, which use mapping tables to quickly locate individual annotations based on a Batch key, and Bloom Filters, which provide a low-overhead approach for efficiently verifying the presence of annotations. Another major focus of this work is mitigating metadata privacy risks, where adversaries could analyze annotations to infer network structures. To obscure network patterns, two privacy-preserving schemes, Hostname Mapping and Hostname Encryption, are introduced, with Hostname Encryption offering a more efficient and secure alternative. Additionally, the research highlights how timestamp metadata can be exploited to reconstruct network structures through clustering techniques. To mitigate this vulnerability, a timestamp obfuscation solution is proposed, introducing controlled randomness to disrupt predictable timing patterns and protect network confidentiality. In summary, the thesis introduces and evaluates methods that significantly enhance the efficiency, scalability, and security of DCFs. These contributions strengthen the practical deployment of DCFs in cloud-edge environments and provide a foundation for future research in secure and trustworthy data management across distributed systems.
Description
Keywords
Distributed systems , Cloud-edge environments , Data trustworthiness , Data Confidence Fabrics , Secure data exchange platforms , Zero Trust , Project Alvarium , Data provenance , Annotation , Authentication and verification , Trust algorithms , Trust Score , Data Confidence , System scalability , Metadata redundancy , Storage cost optimization , Data compression techniques , Compact annotation techniques , Compact JSON annotation format , Batching , Workload optimization , Low transactional overhead , Optimized system performance , Ledger interactions , System efficiency , Efficient annotation retrieval , Ledger querying and searching , Batch Keys , Bloom Filters , Security , Network security , Blockchain forensics , Distributed ledgers , Data analytics , Pattern recognition , Metadata privacy , Metadata obfuscation techniques , Hostname obfuscation , Hostname anonymization , Hostname Encryption , Correlational analysis , Network structure obfuscation , Network topology , Machine learning , Intrusions and malware detection , Vulnerability assessment , K-means clustering , Silhouette score evaluation , Bidirectional analysis , Timestamp analysis , Network structure inference , Timestamp Obfuscation , Networking
Citation
Khalid, A. 2025. Techniques for improving the efficiency and security of Data Confidence Fabrics. MResThesis, University College Cork.
Link to publisher’s version