Association for the Study of Canadian Radio and Television
at the Learned Societies Conference
Sunday October 8, 1995
Classification and automatic indexing
in a persistent object environment
58th Annual Meeting, Chicago
American Society for Information Science
SIG/CR Workshop
©James M. Turner
Professeur adjoint
École de bibliothéconomie et des sciences de
l'information
Université de Montréal
voice +1 514 343 2454
fax +1 514 343 5753
turner@ere.umontreal.ca
http://tornade.ere.umontreal.ca/~turner
Principal investigators:
Robert Godin and Brigitte Kerhervé
Université du Québec à Montréal
Co-investigator:
James M. Turner, professeur adjoint
École de bibliothéconomie et des sciences de l'information
Université de Montréal
- In this paper, "digital libraries" refers to distributed multimedia information systems;
- Typically such systems must manage very large amounts of data;
- File sizes for digital images, audio, and video are large, and the files have no alphabet to make text searching possible;
- Until techniques for edge-detection, object-recognition, matching strategies are perfected, all retrieval is dependent on the metadata;
- Metadata = an old computer science term for what information professionals have usually called a representation or document surrogate;
- For purposes of storage and retrieval, words are always associated with still and moving pictures and with sound materials;
- The index is used in direct querying as a retrieval mechanism, but both browsing and classification are important in providing the user with adequate search strategies and tools.
- Work on using automatic conceptual clustering of documents for retrieval purposes is an approach to this problem (Godin et al. 1993);
- The conceptual hierarchy generated is used as a navigational space where browsing and direct querying can be integrated;
- Browsing in data spaces has long been recognized as an attractive strategy, particularly for casual users and exploration of new domains (Marchionini and Shneiderman 1988);
- Browsing is also widely recognized as an important strategy in retrieving pictures (e.g. O'Connor 1986, Jussim 1977, Schuller 1993);
- Classification of the metadata for images is the focus of the project.
- What is needed is a general classification for images, but none exists (however, Iconclass, for art pictures, is fairly widely used for art);
- Attempts to adapt book classifications (e.g. UDC at the BBC) have ended in failure;
- Because of important technical obstacles, empirical study of storage and retrieval issues for moving images is still difficult, so the research focusses on stills;
- Essentially, what is needed is effective ways for relating words to nontextual information objects;
- One approach is to use automatic indexing techniques on the text, to generate a browsable semantic net;
- Perhaps more effective is to link textual representations of indexing concepts to pictorial representations in a classification showing whole-part relationships;
- This permits searchers to use words and pictures together to retrieve visual information objects;
- A visual dictionary may work for this purpose.
- To test the effectiveness of the dictionary as a front end to an information system;
- To test the effectiveness of the whole-part text component as a controlled vocabulary for the system;
- To determine whether the classificatory structure used in the dictionary can be applied to picture databases with either general or specialised content.
Main menu |
EBSI |
Comments
Other presentation