Feature space mapping neural network
applied to structure-activity relationship problems.

Wlodzislaw Duch
Rafal Adamczak

Computational Intelligence Laboratory,
Department of Informatics,
Nicolaus Copernicus University,
Grudziadzka 5, 87-100 Torun, Poland.

  WWW: https://www.fizyka.umk.pl/~duch

Geerd H.F. Diercksen

Max-Planck Institute of Astrophysics,
85740-Garching b. Munich, Germany
WWW: https://www.mpa-garching.mpg.de/


Plan

  1. Structure activity relationships

  2. Pyrimidines and predictive toxicology evaluation

  3. Feature Space Mapping neurofuzzy system

  4. Results

  5. Conclusions


Structure activity relationships


Molecular compounds:

  • Simple - analyzed using quantum mechanics.
  • Complex - try to relate structure to activity.

SAR - structure-activity relationships.
Quantitative SAR (QSAR), chemical structure - biological and chemical activity.
Drug design, finding promissing drugs - very expensive, 12.000 tried, 1 finaly used.

SAR problems: construct predictive theory.
Learn from chemical compounds of a known structure and activity.
3D molecular structures, large number of attributes: topological indices, molecular field parameters etc.
Finding most informative attributes - very important.

What do we look for?


Pyrimidines and predictive toxicology evaluation

Two SAR problems:

Pyrimidines

Class of chemical compounds with antibiotic activity. They inhibit the activity of bacterial forms of some enzymes in a stronger way than the human forms and therefore kill bacteria. A common template. At three possible substitution positions chemical groups can be added.

3 substitution positions R3-R5.
Each chemical substitute has 9 features with a few symbolic attributes:

  1. group name,
  2. polarity, size,
  3. hydrogen-bond donor,
  4. hydrogen bond acceptor,
  5. pi-donor,
  6. pi-acceptor,
  7. polarizability,
  8. the sigma effect.

Pyrimidine template is described by 27 integer valued features.
No substitution - missing value, but very informative.

Pairs of chemicals (54 features) are compared.
Two classes: first compound has higher activity or vice versa.
2788 cases,
5-fold crossvalidation tests.

The predictive toxicology evaluation (PTE)

Oxford University Computing Laboratory challenge.
Based on US National Toxicology Program (NTP).
330 organic chemicals, 182 (55%) are carcinogenic, 148 non-carcinogenic.

417 features.
8 types of features:

  1. Features 1-69, atom type,
  2. feature 70, mutagenecity alert,
  3. features 71-285 so called WARMR alerts,
  4. no. 286-313 are counts of generic chemical groups found in the molecule,
  5. 314-376 are NTP bulk properties,
  6. 377-404 are various alerts that were used in \cite{ashby},
  7. 405-416 partial genotoxity test results,
  8. no. 417 is the AMES test.
  9. F. 418, class value: 0 non-carcinogenic, 1 carcinogenic, 2 unknown.

Test set include:

Large number of features, small number of test cases.
Statistical differences is not significant on test samples.
Better comparison using crossvalidation tests on all 350 known cases.


Feature Space Mapping neurofuzzy system

Theory for predicting structure relation activity is important.
Logical rules: decision trees, inductive logic programming, neural networks.
Feature Space Mapping (FSM) allows crisp and fuzzy rule sets.

FSM is a universal adaptive system.
Multidimensional separable functions modeling density of the input vectors.
Combinations of features define objects in the feature space, described by the joint density probability of the input/output data vectors using a network of properly parameterized transfer functions.
Gaussian type functions - the only radial separable functions:

X, input vector, D, the center of function, N(s), normalization factor.
Bicentral functions - soft rectangular membership functions:

Logistic functions may be used here.
Adaptation: shifting the centers D, changing spreads b, rescaling slopes s.
Other localized separable functions: triangular, trapezoidal or rectangular functions:

Useful for extraction of crisp and fuzzy logical rules.
Separability facilitates interpretation - neurons provide "context-dependent" membership functions.
FSM is a neurofuzzy system, a density estimation network, a memory based system, a self-organizing system.

FSM architecture: network consists of three layers: an input, one hidden, and an output layer.
Conctructive algorithm: nodes added as needed.
Initial centers - clusterization using dendrograms or decision trees.
Dispersions and rotations of the clusters are optimized.
Output activation: probability, or confidence of the network in its classification.

FSM training algorithm: estimates probability density of input-output pairs in each class.


Results

Mean Spearman's rank correlation coefficient used:

n - number of pairs; d - distance in rank of pairs;    -1<rs<1;

Mean Spearman's rank correlation coefficient for the pyrimidines dataset.

Method Rank correlation
CART 0.499
Linear Regression 0.654
Golem (ILP) 0.684
FSM 0.780

Results from: R.D. King, A. Srinivasan, M.J.E. Sternberg, New Generation Computing (1995).


PTE data: small number of training vectors, high dimensionality.
FSM with rectangle transfer function was done first.
60 features left out of 417 features, same used with Gaussian transfer functions.

Other algorithms:

FSM results: Gaussians, optimization using crossvalidation on the training set.
FSM rules: 11 rules with 53 premises, using 24 features.
16 test vectors correctly (80%), 3 vectors unclassified, 1 error.
No correlation between results on the training/test set.

Accuracy on the test set: 1 case=5%.
FSM-rules: 1 error and 3 unknown cases.

Method Accuracy
Distill-Light 90.0
STEPS 85.0
GloBo 85.0
kNN, k=1, weighted 80.0
FSM-rules 80.0
FSM-Gauss 75.0
OFAI 75.0
Default 70.0

Differences are not statistically significant.
Knowledge from the extracted rules may be useful.

C4.5rules, 10-fold crossvalidation - below the base rate were obtained, 53.6 ± 0.6%.
SSV decision tree: 62.0 ± 1.2 %.

FSM with Gaussians, 19 nodes, 64 ± 1.5% in 10xCV.
Initial clusterization is not significantly improved by learning.
Features are weak, with 40 best features 66% in the 10xCV on the training, 60-70% on the test is obtained after initial clusterization.

kNN, best results with k=1, Euclidean distance, feature selection.
16 features turned off, accuracy 63.2 ± 1.2% in 10xCV tests.
Minkowski's distance, large a=10, 40 features, 10xCV, accuracy 75.7 ± 0.7%, on test 75%.
Tuning scaling factors gives 77.7% on the training and 80% on the test data.


Conclusions

Pyrimidines: FSM results are significantly better but have little explanatory power.
Similar for other rule-based systems.
Alternative: good prototype cases, similarity based methods.

Toxicology data: hard to say.
Analysis of the rules by domain experts needed.
Use domain expert knowledge to pre-structure the FSM network.
Crossvalidation using several systems: accuracy higher than 80% using the features provided is unlikely.

Difficult problems, require further investigation.
Better features - difficult, requires quantum mechnical caluclations for all compounds.
Aggregation of features, combination of features?
SAR - good problems, requiring further development of methods.


Talks by Wlodzislaw Duch