Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection
Prestwich, Steven D.
Sun SITE Central Europe / RWTH Aachen University
Feature selection is used to select a subset of relevant features in machine learning, and is vital for simplification, improving efficiency and reducing overfitting. In filter-based feature selection, a statistic such as correlation or entropy is computed between each feature and the target variable to evaluate feature relevance. A relevance threshold is typically used to limit the set of selected features, and features can also be removed based on redundancy (similarity to other features). Some methods are designed for use with a specific statistic or certain types of data. We present a new filter-based method called Relevance-Redundancy Dominance that applies to mixed data types, can use a wide variety of statistics, and does not require a threshold. Finally, we provide preliminary results, through extensive numerical experiments on public credit datasets.
Feature selection , Machine learning , Filter-based , Relevance-Redundancy Dominance
Browne, D., Manna, C. and Prestwich, S. (2016) 'Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection', in Greene, D., MacNamee, B. and Ross, R. (eds.) Proceedings of the 24th Irish Conference on Artificial Intelligence and Cognitive Science 2016, Dublin, Ireland, 20-21 September. CEUR Workshop Proceedings, 1751, pp. 227-238