Comparison of feature ranking methods based on information entropy.


Wlodzislaw Duch1,2, Tadeusz Wieczorek3, Jacek Biesiada3, Marcin Blachnik3
1School of Computer Engineering, Nanyang Technological University, Singapore,
2Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
3Division of Computer Studies, Department of Electrotechnology, The Silesian University of Technology, Katowice, Poland.

Abstract.

A comparison between five feature ranking methods based on entropy is presented on artificial and real datasets. Feature ranking method using chi2 statistics gives results that are very similar to the entropy-based methods. The quality of feature rankings obtained by these methods is evaluated using the decision tree and the nearest neighbor classifier with growing number of most important features. Significant differences are found in some cases, but there is no single best index that works best for all data and all classifiers. Therefore to be sure that a subset of features giving highest accuracy has been selected requires the use of many different indices.

Reference: Duch W, Wieczorek T, Biesiada J, Blachnik M (2004), Comparison of feature ranking methods based on information entropy.
Proc. of International Joint Conference on Neural Networks (IJCNN), Budapest 2004, IEEE Press, pp. 1415-1420.

Preprint for comments in PDF, 118 KB.

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.