Artificial Intelligence and Intelligent Search
Techniques.

|
Włodzisław
Duch
|
 |
Computational Intelligence Laboratory,
Department of Informatics,
Nicolaus Copernicus University,
Grudziądzka 5, 87-100 Toruń, Poland.
e-mail: id: wduch, na serwerze fizyka.umk.pl.
WWW: http://www.is.umk.pl/~duch
Computational Intelligence, CI: is a branch of science which tries to
solve problems that are effectively nonalgorithmic (such as the semantic retrieval
problem).
Artificial Intelligence, AI: is a branch of CI, stressing the
importance of knowledge, representation of knwledge, rule-based understanding.
Other fields relevant to CI:
Biological inspirations: neural networks, evolutionary programming,
genetic algorithms.
Logic: fuzzy logic, rough logic, possibility theory
Mathematics: multivariate statistics, classification theory,
clusterization, optimization theory
Pattern recognition: computer vision, speech recognition
Enginering: robotics, control theory, biocybernetics
Computer science: theory of grammatics, automata theory, machine learning
"Soft computing" = {neural networks, evolutionary
programming, fuzzy logic}
Useful collections of links:
- AI and machine learning:
www.is.umk.pl/~duch/ai-ml.html
- Statistics, neural networks, neurobiology:
www.is.umk.pl/~duch/neural.html
- Cognitive Science:
www.is.umk.pl/~duch/cognitive.html
- Software for statisitcs, neural networks, machine learning:
www.is.umk.pl/~duch/software.html
Understanding meaning of sentences, learning from existing texts, dialog with humans,
machine translation.
Problems with meaning:
"In managing the DoD there are many unexpected communications problems. For
instance, when the Marines are ordered to "secure a building,"
they form a landing party and assault it. On the other hand, the same
instructions will lead the Army to occupy the building
with a troop of infantry, and the Navy will characteristically respond by
sending a yeoman to assure that the building lights are turned out. When
the Air Force acts on these instructions, what results is a three
year lease with option to purchase."
-- James Schlesinger (former Secretary of Defense, USA).
Basic concepts in NLP:
Syntacs, grammar, parsing and semantics. Meaning refers to background knowledge. What
is knowledge?
Knowledge representation, linguistic (verbal) knowledge structures.
Knowledge as rules:
IF good food THEN salivate
Are we using rules? In the army all the time ...
Knowledge as semantic network
Each network node is a word, connections (arcs) may signify relations
Knowledge as frames
| Generic DOG Frame
Self: an ANIMAL; a PET
Breed: ?
Owner: a PERSON (if-Needed: find a PERSON with pet=myself)
Name: a PROPER NAME (DEFAULT=Rover) |
DOG_NEXT_DOOR Frame
Self: a DOG
Breed: mutt
Owner: Jimmy
Name: Fido |
Knowledge as scripts
Stereotypic story: restaurants, accidents, business
Many other knowledge representation schemes.
Good part of AI is knowledge
engineering.
Mind and concept spaces
How to show similarity relations between words?
Psychologists: semantic distance from associations or time of reactions.
How do we do it with our brains? Neural networks.
Vector description instead of neural activations - perhaps about 300 dimensions are
sufficient (Latent Semantic Analysis indication).
High similarity of symbols or concepts <=> close in the concept space.
"Platonic mind" model - a few pictures
Semantic maps
Semantic maps: 96som-inf.sam SOM map of oil from Italy -
../../g-input/italy.eps

The DISCERN architecture (performance configuration).
The model consists of parsing, generating, question answering, and memory subsystems, two
modules each. A dark square indicates a memory module, a light square indicates a
processing module. The lines indicate pathways carrying distributed word, sentence, and
story representations during the performance phase of the system. The modules are trained
separately with compatible I/O data.

The lexicon. The lexical input symbol JOHN is
translated into the semantic representation of the concept John. The representations are
vectors of gray-scale values between 0.0 and 1.0, stored in the weights of the units. The
size of the unit on the map indicates how strongly it responds. Only a small part of each
map, and only a few strongest associative connections of the lexical unit JOHN are shown
in this Figure.

The FGREP-module. At each I/O presentation, the
representations at the input layer are modified according to the backpropagation error
signal, and replace the old representations in the lexicon. In the case of sequential
input or output, the hidden layer pattern is saved after each step in the sequence, and
used as input to the hidden layer during the next step, together with the actual input.

The hierarchical feature map classification of
script-based stories. Labels indicate the maximally responding unit for the different
scripts and tracks. This particular input story representation is classified as an
instance of the restaurant script (top level) and fancy-restaurant track (middle level),
with role bindings customer=John, food=lobster, restaurant=MaMaison, tip=big (i.e., unit
JLMB, bottom level). Before passing on the originally 84-component representation vector
to the next level, the REST-unit removes those 22 components that do not vary across the
different restaurant stories, and the FANCY-unit removes those 44 components of the
remaining vector that do not vary across the different fancy-restaurant stories. At the
bottom level, only an 18 component vector representing the role bindings remains to be
mapped. Compression is determined automatically based on the variance in the components,
and varies slightly depending on the script and the track.

Lexicon propagation. The orthographic input symbol
DOG is translated into the semantic concept dog in this example. The representations are
vectors of gray-scale values between 0 and 1, stored in the weights of the feature map
units. The size of the unit on the map indicates how strongly it responds. Only a few
strongest associative connections of the orthographic input unit DOG (and only that unit)
are shown.

The training data for the lexicon. Orthographic
representations are blurred bitmaps of the orthographic words and phonological
representations consist of concatenations of phoneme representations. Concept
representations were developed by FGREP in the case-role assignment task and stand for
distinct meanings. Gray-scale boxes indicate component values between 0 and 1. The
connections depict the mapping between the symbols and their meanings. Many concepts map
to several synonymous lexical symbols, and the homonymous symbols CHICKEN and BAT map to
two distinct concepts each. The orthographic and phonological symbols correspond
one-to-one to each other in this data.

The orthographic, phonological, and semantic maps.
The input and output maps in each modality have the same order, shown here only once in
(a) and (b). (a) Orthographic map. Each unit in the 9 9 network is represented by a box,
and the labels indicate the image unit for each symbol representation. The map is divided
into major subareas according to word length. (b) Phonological map. The labels indicate
the images for each phonological word representation. Again, the word length is the major
ordering factor. (c) Semantic map. The labels on this 7 7 map indicate the maximally
responding unit for each concept representation. The map is organized according to the
semantic categories (table 5).
Example of application in information management:
WebSOM Project - 1 milion documents categorized!
DARPA intelligent search projects.
Taxonomy and automatic classification projects.
Beyond search:
Automatic summarization of news and general texts - difficult but ...
- see Inxight http://www.inxight.com/
- see the
Altavista Discovery project
Data Mining
Find short logical summary of the database, or show the most important relationships.
Example: Iris, mushrooms