Cloud-based machine learning architecture for big data analysis
University College Cork
The use of machine-learning that leverages large amounts of data (big data) is increasingly important in many areas of business and research. To help cope with the demanding resources required by these applications, solutions including hardware platforms (e.g. graphics cards), more efficient algorithms (e.g. deep learning algorithms), and special software environments (e.g. tensor flow) have been developed. In addition, for specific applications, special optimisations are often developed based on the requirements of the particular application. This thesis also addresses the challenge of efficiency of machine learning over big data but does so in a way that is complementary to specialised hardware and algorithms, and in a way that is also independent of application and data type. The thesis has developed several types of general optimisations and implemented these on top of an underlying generic machine learning architecture. The generic machine learning architecture includes stages for segmentation, feature extraction, model building and classification. The optimisation components enhance this architecture in a general way that works with any datatype and any dataset, and where the optimisation responds to the needs of the particular application, and is self-adjusting for the particular dataset being processed. The optimisations developed are: model optimisation; feature optimisation; resources optimisation; cloud platform cost-benefit optimisation. Model optimisation involves evaluating multiple models in parallel, and using feedback on model performance to choose the best ones based on the dataset being processed. Feature optimisation involves evaluating various features and combinations of features, and then choosing those features that are most effective for classification. Resources optimisation involves dynamically adjusting compute instances to respond to the demands of an application. Cloud platform cost-benefit optimisation involves evaluating the cost of available public cloud compute instances, and determining appropriate cost-efficient instances depending on the needs of an application. General techniques of sampling, evaluation and feedback are used in several optimisation components. The underlying framework and optimisations have been implemented and deployed in a private cloud environment. Evaluation on various datasets ( image and text datasets) has shown these optimisation components to be effective, and provide useful generic components that can work in conjunction with other optimisations to address the challenging demands of machine learning over big data.
Big-data , Cloud-computing , Machine learning
Pakdel, R. 2019. Cloud-based machine learning architecture for big data analysis. PhD Thesis, University College Cork.