Association for the Study of Canadian Radio and Television
at the Learned Societies Conference Sunday October 8, 1995

Classification and automatic indexing in a persistent object environment

58th Annual Meeting, Chicago American Society for Information Science SIG/CR Workshop
©James M. Turner Professeur adjoint École de bibliothéconomie et des sciences de l'information Université de Montréal voice +1 514 343 2454 fax +1 514 343 5753 turner@ere.umontreal.ca http://tornade.ere.umontreal.ca/~turner

Separation line
Principal investigators:
Robert Godin and Brigitte Kerhervé
Université du Québec à Montréal

Co-investigator:
James M. Turner, professeur adjoint
École de bibliothéconomie et des sciences de l'information
Université de Montréal

Separation line

Background

In this paper, "digital libraries" refers to distributed multimedia information systems;
Typically such systems must manage very large amounts of data;
File sizes for digital images, audio, and video are large, and the files have no alphabet to make text searching possible;
Until techniques for edge-detection, object-recognition, matching strategies are perfected, all retrieval is dependent on the metadata;
Metadata = an old computer science term for what information professionals have usually called a representation or document surrogate;
For purposes of storage and retrieval, words are always associated with still and moving pictures and with sound materials;
The index is used in direct querying as a retrieval mechanism, but both browsing and classification are important in providing the user with adequate search strategies and tools.

Separation line

Some approaches

Work on using automatic conceptual clustering of documents for retrieval purposes is an approach to this problem (Godin et al. 1993);
The conceptual hierarchy generated is used as a navigational space where browsing and direct querying can be integrated;
Browsing in data spaces has long been recognized as an attractive strategy, particularly for casual users and exploration of new domains (Marchionini and Shneiderman 1988);
Browsing is also widely recognized as an important strategy in retrieving pictures (e.g. O'Connor 1986, Jussim 1977, Schuller 1993);
Classification of the metadata for images is the focus of the project.

Separation line

Classification issues

What is needed is a general classification for images, but none exists (however, Iconclass, for art pictures, is fairly widely used for art);
Attempts to adapt book classifications (e.g. UDC at the BBC) have ended in failure;
Because of important technical obstacles, empirical study of storage and retrieval issues for moving images is still difficult, so the research focusses on stills;
Essentially, what is needed is effective ways for relating words to nontextual information objects;
One approach is to use automatic indexing techniques on the text, to generate a browsable semantic net;
Perhaps more effective is to link textual representations of indexing concepts to pictorial representations in a classification showing whole-part relationships;
This permits searchers to use words and pictures together to retrieve visual information objects;
A visual dictionary may work for this purpose.

Separation line

Objectives

To test the effectiveness of the dictionary as a front end to an information system;
To test the effectiveness of the whole-part text component as a controlled vocabulary for the system;
To determine whether the classificatory structure used in the dictionary can be applied to picture databases with either general or specialised content.

Separation line

Classification and automatic indexing in a persistent object environment

Background

Some approaches

Classification issues

Objectives

Main menu | EBSI | Comments