Statlog Datasets: comparison of results

Computational Intelligence Laboratory | Department of Informatics | Nicolaus Copernicus University

Machine Learning, Neural and Statistical Classification, D. Michie, D.J. Spiegelhalter, C.C. Taylor (eds), Stalog project - whole book!
More results for medical and other data.

A note of caution: comparison of different classifiers is not an easy task. Before you get into ranking of methods using the numbers presented in tables below please note the following facts.

Many results we have collected give only a single number (even results from the StatLog project!), without standard deviation. Since most classifiers may give results that differ by several percent on slightly different data partitions single numbers do not mean much.

Leave-one-out tests have been criticized as a basis for accuracy evaluation, the conclusion is that crossvalidation is safer, cf:
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th Int. Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1137-1143.

Crossvalidation tests (CV) are also not ideal. Theoretically about 2/3 of results should be within a single standard deviation from the average, and 95% of results should be within two standard deviations, so in a 10-fold crossvalidation you should see very rarely reuslts that are beter or worse than 2xSTDs. Running CV several times may also give you different answers. Search for the best estimator continues. Cf:
Dietterich, T. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10 (7), 1895-1924;
Nadeau C, Bengio Y. (1999) Inference for the Generalization Error. Tech. rep. 99s-25, CIRANO, J. Machine Learning (Kluver, in print).

Even the best accuracy and variance estimation is not sufficient, since performance cannot be characterized by a single number. It should be much better to provide full Receiver Operator Curves (ROC). Combining ROC with variance estimation would be ideal. Unfortunately this still remains to be done. All we can do now is to collect some numbers in tables.

Credit management

Statlog version, 2 classes, 7 attributes, no. of training cases=15000, no. of test cases=5000;
Unfortunately this data is not public; anyone knows where to find it?

Algorithm	Error (Train)	Error (Test)	who
Discrim	0.031	0.033	statlog
Quadisc	0.051	0.050	statlog
Logdisc	0.031	0.030	statlog
SMART	0.021	0.020	statlog
ALLOC80	0.033	0.031	statlog
k-NN	0.028	0.088	statlog
CASTLE	0.051	0.047	statlog
CART	FD	FD	statlog
IndCART	0.010	0.025	statlog
NewID	0.000	0.033	statlog
AC2	0.000	0.030	statlog
Baytree	0.002	0.028	statlog
NaiveBay	0.041	0.043	statlog
CN2	0.000	0.032	statlog
C4.5	0.014	0.046	statlog
ITrule	0.041	0.046	statlog
Cal5	0.018	0.023	statlog
Kohonen	0.037	0.043	statlog
DIPOL92	0.020	0.020	statlog
Backprop	0.020	0.023	statlog
RBF	0.033	0.031	statlog
LVQ	0.024	0.040	statlog
Default	0.051	0.047	statlog

Australian credit dataset

Statlog dataset, 2 classes, 14 attributes, 690 observations, class distribution 55.5%, 44.5%.
37 missing values, A1: 12, A2: 12, A4: 6, A5: 6, A6: 9, A7: 9, A14: 13
10-fold cross-validation.

Algorithm	Error (Train)	Error (Test)	who
Cal5	0.132	0.131	statlog
k-NN,k=18,manh,std	---	0.136	KG
ITrule	0.162	0.137	statlog
w-NN,k=18,manh, simplex,std	---	0.138	KG
SVM Gauss	--	0.138± 0.041	C=0.1, s=0.01, over 460 SV
Discrim	0.139	0.141	statlog
DIPOL92	0.139	0.141	statlog
SSV 3 nodes	---	0.142± 0.040	Ghostminer, WD, uses 5 features.
SSV 3 nodes	---	0.145± 0.035	Ghostminer, WD, uses F8 only!
C4.5	--	0.145± 0.007	statlog
CART	0.145	0.145	statlog
RBF	0.107	0.145	statlog
SVM lin	--	0.148± 0.030	C=1, over 190 SV
CASTLE	0.144	0.148	statlog
NaiveBay	0.136	0.151	statlog
SVM Gauss	--	0.152± 0.032	C=1, over 290 SV
IndCART	0.081	0.152	statlog
k-NN k=11,std,eucl	--	0.152	KG
Backprop	0.087	0.154	statlog
C4.5	0.099	0.155	statlog
k-NN,k=11,fec.sel, eucl,std	--	0.156	KG
SMART	0.090	0.158	statlog
Baytree	0.000	0.171	statlog
k-NN	--	0.181	statlog
NewID	0.000	0.181	statlog
AC2	0.000	0.181	statlog
LVQ	0.065	0.197	statlog
ALLOC80	0.194	0.201	statlog
CN2	0.001	0.204	statlog
Quadisc	0.185	0.207	statlog
Default	0.440	0.440	statlog
Kohonen	Failed	Failed	statlog

4 x 4 digit dataset

Statlog dataset, 10 classes, 16 attributes (train,test)=(9000,9000) observations)

Algorithm	Error (Train)	Error (Test)	who
Discrim	0.111	0.114	statlog
Quadisc	0.052	0.054	statlog
Logdisc	0.079	0.086	statlog
SMART	0.096	0.104	statlog
ALLOC80	0.066	0.068	statlog
k-NN	0.016	0.047	statlog
CASTLE	0.180	0.170	statlog
CART	0.180	0.160	statlog
IndCART	0.011	0.154	statlog
NewID	0.080	0.150	statlog
AC2	*	0.155	statlog
Baytree	0.015	0.140	statlog
NaiveBay	0.220	0.233	statlog
CN2	0.000	0.134	statlog
C4.5	0.041	0.149	statlog
ITrule	*	0.222	statlog
Cal5	0.118	0.220	statlog
Kohonen	0.051	0.075	statlog
DIPOL92	0.065	0.072	statlog
Backprop	0.072	0.080	statlog
RBF	0.080	0.083	statlog
LVQ	0.040	0.061	statlog
Default	0.900	0.900	statlog

Karhunen-Loeve digits

Statlog dataset, 10 classes, 40 attributes, (train,test) = (9000,9000) observations
Unfortunately this data is not public; anyone knows where to find it?

Algorithm	Error (Train)	Error (Test)	who
Discrim	0.070	0.075	statlog
Quadisc	0.016	0.025	statlog
Logdisc	0.032	0.051	statlog
SMART	0.043	0.057	statlog
ALLOC80	0.000	0.024	statlog
k-NN	0.000	0.020	statlog
CASTLE	0.126	0.135	statlog
CART	FD	FD	statlog
IndCART	0.003	0.170	statlog
NewID	0.000	0.162	statlog
AC2	0.000	0.168	statlog
Baytree	0.006	0.163	statlog
NaiveBay	0.205	0.223	statlog
CN2	0.036	0.180	statlog
C4.5	0.050	0.180	statlog
ITrule	*	0.216	statlog
Cal5	0.128	0.270	statlog
Kohonen	FD	FD	statlog
DIPOL92	0.030	0.039	statlog
Backprop	0.041	0.049	statlog
RBF	0.048	0.055	statlog
LVQ	0.011	0.026	statlog
Cascade	0.063	0.075	statlog
Default	0.900	0.900	statlog

Vehicle dataset

Statlog dataset, 4 classes, 18 attributes, 846 observations, 9-fold cross-validation

Algorithm	Error (Train)	Error (Test)	who
Discrim	0.202	0.216	statlog
Quadisc	0.085	0.150	statlog
Logdisc	0.167	0.192	statlog
SMART	0.062	0.217	statlog
ALLOC80	0.000	0.173	statlog
k-NN	--	0.275	statlog
k-NN,k=4,manh,std	-	0.272	KG
k-NN,k=4, manh,fec. sel,std	-	0.283	KG
w-NN,k=4 manh,std simplex	0.287	KG
CASTLE	0.545	0.505	statlog
CART	0.284	0.235	statlog
IndCART	0.047	0.298	statlog
NewID	0.030	0.298	statlog
AC2	*	0.296	statlog
Baytree	0.079	0.271	statlog
NaiveBay	0.519	0.558	statlog
CN2	0.018	0.314	statlog
C4.5	0.065	0.266	statlog
ITrule	*	0.324	statlog
Kohonen	0.115	0.340	statlog
DIPOL92	0.079	0.151	statlog
Backprop	0.168	0.207	statlog
RBF	0.098	0.307	statlog
LVQ	0.171	0.287	statlog
Cascade	0.263	0.280	statlog
Default	0.750	0.750	statlog

Letters

Statlog dataset, 26 classes, 16 attributes, (train,test) = (15000,5000) observations

Algorithm	Error (Train)	Error (Test)	who
ALLOC80	0.065	0.064	statlog
k-NN	0.000	0.068	statlog
LVQ	0.057	0.079	statlog
Quadisc	0.101	0.113	statlog
CN2	0.021	0.115	statlog
Baytree	0.015	0.124	statlog
NewID	0.000	0.128	statlog
IndCART	0.010	0.130	statlog
C4.5	0.042	0.132	statlog
DIPOL92	0.167	0.176	statlog
RBF	0.220	0.233	statlog
Logdisc	0.234	0.234	statlog
CASTLE	0.237	0.245	statlog
AC2	0.000	0.245	statlog
Kohonen	0.218	0.252	statlog
Cal5	0.158	0.253	statlog
SMART	0.287	0.295	statlog
Discrim	0.297	0.302	statlog
Backprop	0.323	0.327	statlog
NaiveBay	0.516	0.529	statlog
ITrule	0.585	0.594	statlog
Default	0.955	0.960	statlog
CART	FD	FD	statlog

Chromosome dataset

Statlog dataset, 24 classes, 16 attributes, (train,test) = (20000,20000) observations; unfortunately the dataset is not public! Anyone knows where to find it?

Algorithm	Error (Train)	Error (Test)	who
Discrim	0.073	0.107	statlog
Quadisc	0.046	0.084	statlog
Logdisc	0.079	0.131	statlog
SMART	0.082	0.128	statlog
ALLOC80	0.192	0.253	statlog
k-NN	0.000	0.123	statlog
CASTLE	0.129	0.178	statlog
CART	FD	FD	statlog
IndCART	0.007	0.173	statlog
NewID	0.000	0.176	statlog
AC2	0.000	0.234	statlog
Baytree	0.034	0.164	statlog
NaiveBay	0.260	0.324	statlog
CN2	0.010	0.150	statlog
C4.5	0.038	0.175	statlog
ITrule	0.681	0.697	statlog
Cal5	0.142	0.244	statlog
Kohonen	0.109	0.174	statlog
DIPOL92	0.049	0.091	statlog
Backprop	FD	FD	statlog
RBF	0.087	0.129	statlog
LVQ	0.067	0.121	statlog
Default	0.956	0.956	statlog

Satellite image (SatImage)

Statlog dataset, 6 classes, 36 attributes, (train,test)=(4435,2000) observations

Algorithm	Error (Train)	Error (Test)	who
k-NN	0.089	0.094	statlog
k-NN,k=2,3, eucl	-	0.097	KG
LVQ	0.048	0.105	statlog
DIPOL92	0.051	0.111	statlog
RBF	0.111	0.121	statlog
ALLOC80	0.036	0.132	statlog
CART	0.079	0.138	statlog
IndCART	0.023	0.138	statlog
Backprop	0.112	0.139	statlog
Baytree	0.020	0.147	statlog
CN2	0.010	0.150	statlog
C4.5	0.040	0.150	statlog
NewID	0.067	0.150	statlog
Cal5	0.125	0.151	statlog
Quadisc	0.106	0.155	statlog
AC2	*	0.157	statlog
SMART	0.123	0.159	statlog
Logdisc	0.119	0.163	statlog
Discrim	0.149	0.171	statlog
Kohonen	0.101	0.179	statlog
Cascade	0.112	0.163	statlog
CASTLE	0.186	0.194	statlog
Default	0.758	0.769	statlog
ITrule	FD	FD	statlog

Image segmentation

Statlog dataset, 7 classes, 11 attributes, 2310 observations, 10-fold cross-validation

Algorithm	Error (Train)	Error (Test)	who
Discrim	0.112	0.116	statlog
Quadisc	0.155	0.157	statlog
Logdisc	0.098	0.109	statlog
SMART	0.039	0.052	statlog
ALLOC80	0.033	0.030	statlog
k-NN	--	0.077	statlog
k-NN,k=1, eucl	-	0.035	KG
k-NN,k=1, manh	-	0.028	KG
CASTLE	0.108	0.112	statlog
CART	0.005	0.040	statlog
IndCART	0.012	0.045	statlog
NewID	0.000	0.034	statlog
AC2	0.000	0.031	statlog
Baytree	0.000	0.033	statlog
NaiveBay	0.260	0.265	statlog
CN2	0.003	0.043	statlog
C4.5	0.013	0.040	statlog
ITrule	0.445	0.455	statlog
Cal5	0.042	0.062	statlog
Kohonen	0.046	0.067	statlog
DIPOL92	0.021	0.039	statlog
Backprop	0.028	0.054	statlog
RBF	0.047	0.069	statlog
LVQ	0.019	0.046	statlog
Default	0.760	0.760	statlog

Datasets with costs

Heart disease

Statlog dataset, 2 classes, 13 attributes, 270 observations, 9-fold cross-validation.
Algorithms in italics have not incorporated costs.

The below table illustrates misclassification costs for the heart disease dataset.
The columns represent the predicted class and the rows the true class.

Cost Matrix =	Absence	Presence
	0	1
	5	0

Algorithm	Error (Train)	Error (Test)	who
k-NN,k=30,eucl,std	-	0.344	KG
NaiveBay	0.351	0.374	statlog
Discrim	0.315	0.393	statlog
Logdisc	0.271	0.396	statlog
ALLOC80	0.394	0.407	statlog
Quadisc	0.274	0.422	statlog
CASTLE	0.374	0.441	statlog
Cal5	0.330	0.444	statlog
CART	0.463	0.452	statlog
Cascade	0.207	0.467	statlog
k-NN	0.000	0.478	statlog
SMART	0.264	0.478	statlog
DIPOL92	0.429	0.507	statlog
ITrule	*	0.515	statlog
Baytree	0.111	0.526	statlog
Default	0.560	0.560	statlog
Backprop	0.381	0.574	statlog
LVQ	0.140	0.600	statlog
IndCART	0.261	0.630	statlog
Kohonen	0.429	0.693	statlog
AC2	0.000	0.744	statlog
CN2	0.206	0.767	statlog
RBF	0.303	0.781	statlog
C4.5	0.439	0.781	statlog
NewID	0.000	0.844	statlog
k-NN,k=1,eucl,std	-	0.725	KG

German credit

Statlog dataset, 2 classes, 24 attributes, 1000 observations, 10-fold cross-validation
Algorithms in italics have not incorporated costs.

The table below illustrates the cost matrix for the German credit dataset. The columns are the predicted class and the rows the true class.

good	bad
good	0	1
bad	5	0

Algorithm	Error (Train)	Error (Test)	who
Discrim	0.509	0.535	statlog
Quadisc	0.431	0.619	statlog
Logdisc	0.499	0.538	statlog
SMART	0.389	0.601	statlog
ALLOC80	0.597	0.584	statlog
k-NN	0.000	0.694	statlog
k-NN,k=17, eucl,std	-	0.411	KG
CASTLE	0.582	0.583	statlog
CART	0.581	0.613	statlog
IndCART	0.069	0.761	statlog
NewID	0.000	0.925	statlog
AC2	0.000	0.878	statlog
Baytree	0.126	0.778	statlog
NaiveBay	0.600	0.703	statlog
CN2	0.000	0.856	statlog
C4.5	0.640	0.985	statlog
ITrule	*	0.879	statlog
Cal5	0.600	0.603	statlog
Kohonen	0.689	1.160	statlog
DIPOL92	0.574	0.599	statlog
Backprop	0.446	0.772	statlog
RBF	0.848	0.971	statlog
LVQ	0.229	0.963	statlog
Default	0.700	0.700	statlog