Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection

Loading...
Thumbnail Image
Files
2423.pdf(257.5 KB)
Published Version
Date
2016-09
Authors
Browne, David
Manna, Carlo
Prestwich, Steven D.
Journal Title
Journal ISSN
Volume Title
Publisher
Sun SITE Central Europe / RWTH Aachen University
Published Version
Research Projects
Organizational Units
Journal Issue
Abstract
Feature selection is used to select a subset of relevant features in machine learning, and is vital for simplification, improving efficiency and reducing overfitting. In filter-based feature selection, a statistic such as correlation or entropy is computed between each feature and the target variable to evaluate feature relevance. A relevance threshold is typically used to limit the set of selected features, and features can also be removed based on redundancy (similarity to other features). Some methods are designed for use with a specific statistic or certain types of data. We present a new filter-based method called Relevance-Redundancy Dominance that applies to mixed data types, can use a wide variety of statistics, and does not require a threshold. Finally, we provide preliminary results, through extensive numerical experiments on public credit datasets.
Description
Keywords
Feature selection , Machine learning , Filter-based , Relevance-Redundancy Dominance
Citation
Browne, D., Manna, C. and Prestwich, S. (2016) 'Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection', in Greene, D., MacNamee, B. and Ross, R. (eds.) Proceedings of the 24th Irish Conference on Artificial Intelligence and Cognitive Science 2016, Dublin, Ireland, 20-21 September. CEUR Workshop Proceedings, 1751, pp. 227-238