Heterogeneous distance functions for prototype rules, influence of parameters on probability estimation.


Marcin Blachnik1 , Wlodzislaw Duch2,3 and Tadeusz Wieczorek1
1Division of Computer Methods, Department of Electrotechnology, The Silesian University of Technology, Katowice, Poland.
2School of Computer Engineering, Nanyang Technological University, Singapore.
3Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.

Abstract.

Abstract. An interesting and little explored way to understand data is based on prototype rules (P-rules). The goal of this approach is to find optimal similarity (or distance) functions and position of prototypes to which unknown vectors are compared. In real applications similarity functions frequently involve different types of attributes, such as continuous, discrete, binary or nominal. Heterogeneous distance functions that may handle such diverse information are usually based on probability distance measure, such as the Value Difference Metrics (VDM). For continuous attributes calculation of probabilities requires estimations of probability density functions. This process requires careful selection of several parameters that may have important impact on the overall classification accuracy.
In this paper various heterogeneous distance function based on VDM measure are presented, among them some new heterogeneous distance functions based on different types of probability estimation. Results of many numerical experiments with such distance functions are presented on artificial and real datasets, and quite simple P-rules for several heterogeneous databases extracted.

Preprint for comments in PDF, 128 KB.

Reference: Blachnik M, Duch W, Wieczorek T, Heterogeneous distance functions for prototype rules, influence of parameters on probability estimation. International Journal of Artificial Intelligence Studies (in print)

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.