Proposal for Phase 2
WP 3: Networking and Digitisation
Deliverable 3.5


Version : 1
Date : Sept. 18th, 1997
Author : OEAW-WAD
Confidentiality : Public
Status : final


Table of Contents:


1. Analog Media containing music information

Harmonica Phase 1 has been successful in identifying music information content related document carriers and formats; the examination of relevant music information volumes based on a questionnaire has provided the following main results:  Harmonica Phase 2 should cover the following tasks:

2. Digitisation

Harmonica Phase 1 has identified a vast variety of sampling rates, file formats and resolution levels in use, depending on hard- and software characteristics abroad. Standardisation of sound file formats is on the way as "The Broadcast Wave Format" for audio, drafted by EBU (1996). This type of file is specified in the Microsoft "Resource Interchange Format" (RIFF) and is appropriate for uncompressed sound. As has been expressed in the name of the draft already, it is intended as a broadcast format and not as a standard of many sound studio production formats, nor as a music sound archive format, which may be different in many characteristics. It is also not forseen to replace the generally accepted digital sound transmission file format AES/EBU. Standardisation of image file formats (moving and still images) is even less effective. To some extent, several graphic file formats are interchangeable with little or no loss of picture quality, provided the data compression algorithms applied are code regenerating. Standardised digital-image, digital-text, digital-sound/video/image filenames have to be assigned as a part of the initial digitisation process. Certain arrangements of directories and subdirectories have to follow the specifications of the libraries, faciliating future access to images, sound and text. This task should be performed quite ahead of digital conversion starts. Scanning and digitisation Logs should include all relevant data of the transfer process (transfer protocol). Problems, irregularities, filter characteristics, reproduction devices, scanning machines etc. should be listed, file by file or section by section. Anomalies of the digitisation procedure or exceptions due to document characteristics (extremly reduced signal to noise ratios, missing parts etc.) should also be available in machine-readable form.

Quality control has to guarantee that the requirements for delivery and accuracy of the digitisation procedure have been met. Quality control includes the sound- and image quality, the document integrity, the completeness of the transfer etc. When text conversion is applied, a level of accuracy (99%?) has to be stated.

3. Metadata

Metadata may include full encoding of converted texts, usually with Standard Generalized Markup Language (SGML), structural elements, highlighted text, video and sound sequences etc., as well as technical data about the document type itself, its coding, size, usability and priority of access. Metadata can reflect highly structured information about the technical characteristics of a document and/or as an optimum, a perfect representation of the content (semantic retrieval). Automated indexing and metadata generation for music sound and video documents is still an issue of basic research.

4. Music sound- image, text archive systems

Automated backup systems and software are already available from several sources. Platforms usually supported are: Windows NT, NetWare, SunOS Solaris, AIX, HP-UX etc. Data backup utilities frequently are tailored for standalone servers only. Multiple server environments may become more and more important in library applications. High performance data storage products range up to several Petabytes using thousands of robotic controlled magnetic tape cassettes. Harmonica Phase 2 should address the usability of scalable solutions for libraries.

5. Networking

Networking in libraries can be seen as a threefold issue. (1) Libraries have to install their own infrastructure, typically a LAN with data acquisition workstations, servers and local storage devices. (2) Since libraries can use backup and storage capacities at remote computer centers, they need high speed data communication in a WAN-like environment. (3) Distribution of binary documents via global networks, as via online services and the internet incorporates tasks like catalogue search, listening (seeing) in advance, acquiring rights clearance, paying and downloading.

"Bringing the Search to the Net" involves the transition from document search to concept search via networks. Harmonica Phase 2 should address technical aspects of network installation as well as developments of new communication interfaces using www-downloadable Java applets. As one example Java/CORBA (Common Object Request Broker Architecture) provides advantages in higher flexibiltiy of the user interface implementation, automatic maintainability, easy server configuration and platform independent client handling.

