Feature Selection for High-Dimensional Data: A Pearson Redundancy Based Filter

Feature Selection for High-Dimensional Data: A Pearson Redundancy Based Filter.

Jacek Biesiada¹ and Wlodzislaw Duch^2,3.
¹Division of Computer Studies, Department of Electrotechnology, The Silesian University of Technology, Katowice, Poland.
²Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
³School of Computer Engineering, Nanyang Technological University, Singapore.

Abstract.

An algorithm for filtering information based on the Pearson $\chi^2$ test approach has been implemented and tested on feature selection. This test is frequently used in biomedical data analysis and should be used only for nominal (discretized) features. The algorithm has only one parameter, statistical confidence level that two distributions are identical. Empirical comparisons with four other state-of-the-art features selection algorithms (FCBF, CorrSF, ReliefF and ConnSF) are very encouraging.

Preprint for comments in PDF, 85 KB.

Reference: Biesiada J, Duch W, Feature Selection for High-Dimensional Data: A Pearson Redundancy Based Filter.
Lecture Notes in Computer Science, Vol. xxx, pp. xxx-yyy, 2007

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.