Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection

Browne, David; Manna, Carlo; Prestwich, Steven D.

Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection

cb

Files

2423.pdf(257.5 KB)

Published Version

Date

2016-09

Authors

Browne, David

Manna, Carlo

Prestwich, Steven D.

Publisher

Sun SITE Central Europe / RWTH Aachen University

Abstract

Feature selection is used to select a subset of relevant features in machine learning, and is vital for simplification, improving efficiency and reducing overfitting. In filter-based feature selection, a statistic such as correlation or entropy is computed between each feature and the target variable to evaluate feature relevance. A relevance threshold is typically used to limit the set of selected features, and features can also be removed based on redundancy (similarity to other features). Some methods are designed for use with a specific statistic or certain types of data. We present a new filter-based method called Relevance-Redundancy Dominance that applies to mixed data types, can use a wide variety of statistics, and does not require a threshold. Finally, we provide preliminary results, through extensive numerical experiments on public credit datasets.

Keywords

Feature selection , Machine learning , Filter-based , Relevance-Redundancy Dominance

Citation

Browne, D., Manna, C. and Prestwich, S. (2016) 'Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection', in Greene, D., MacNamee, B. and Ross, R. (eds.) Proceedings of the 24th Irish Conference on Artificial Intelligence and Cognitive Science 2016, Dublin, Ireland, 20-21 September. CEUR Workshop Proceedings, 1751, pp. 227-238