Margin-based feature selection filters for microarray gene expression data.


Wlodzislaw Duch1,2 and Jacek Biesiada3
1School of Computer Engineering, Nanyang Technological University, Singapore.
2Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
3Division of Computer Studies, Department of Electrotechnology, The Silesian University of Technology, Katowice, Poland.

Abstract.

Information selection filters use various relevancy criteria, such as Bayesian consistency, correlation coefficient or mutual information, to determine usefulness of features. Several new ranking indices are introduced. Instead of using all vectors to calculate ranking index margins excluding vectors from strongly overlapping regions are used, sacrificing training accuracy for generalization in ranking of features. This technique is especially useful for microarray gene expression data, where the number of features is very large and the number of samples is very small. Feature selection for three such datasets shows that a relatively small number of genes give the best performance.

Preprint for comments in PDF, 303 KB.

Reference: Duch W, Biesiada J, Margin-based feature selection filters for microarray gene expression data.
International Journal of Information Technology and Intelligent Computing 1 (2006) 9-33

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.