- Source: UVC-based preservation
UVC-based preservation is an archival strategy for handling the preservation of digital objects. It employs the use of a Universal Virtual Computer (UVC)—a virtual machine (VM) specifically designed for archival purposes, that allows both emulation and migration to a language-neutral format like XML.
Background to the development of a UVC approach
=
Preservation of digital resources is of a paramount importance for deposit libraries, research libraries, archives, government agencies, and actually most organizations. The dominant approach to digital preservation is migration. Migration entails making periodic transformations of archived information into new logical formats as their native formats, or the software or hardware on which they depend becomes obsolete. The notable danger of migration is data loss, and possible loss of original functionality or the ‘look and feel’ of the original format. Furthermore, digital migrations are time consuming and costly as the process requires converting the format of every document, in addition to copying converted bit streams to new media as necessary.
= Emulation theory
=
Jeff Rothenberg caused a bit of stir in organizations concerned and responsible for digital preservation with his report in 1999: "Avoiding technological quicksand: Finding a viable technical foundation for digital preservation". He states that there are no viable solutions to ensure that digital information will be readable in the future. The proposed solutions of relying on standards and migrations are labeled time consuming and ultimately incapable of preserving digital documents in their original form. He suggests:
"an ideal approach should provide a single, extensible, long-term solution that can be designed once and for all and applied uniformly, automatically, and in synchrony (for example, at every future refresh cycle) to all types of documents and all media, with minimal human intervention."
He proposes that the best way to satisfy the above criteria is Emulation by; developing an emulator that will run on unknown future computers; developing techniques to capture the metadata needed to find, access and recreate the document; developing techniques for encapsulating documents, their attendant metadata, software, and emulator specifications.
In 2000, he suggests implementing an emulation-based preservation approach in which emulator specification are expressed as programs and interpreted by an emulator specification interpreter program written for an emulation virtual machine.
Rothenberg's approach was met with skepticism and considered too technically challenging, too expensive and too time consuming, and therefore an economic risk (without the support of empirical evidence). (See further reading section)
= UVC concept development
=
Role of IBM
Raymond A. Lorie, during his employment at IBM Research Centre Almaden, initiated the development of a UVC-based solution to long-term digital preservation. He describes the approach as ‘Universal’ because its definition is so basic that it will endure forever, ‘Virtual’ because it will never have to be physically built and it is a ‘Computer’ in its functionality.
IBM (NL), the asset owner of the UVC, continues to develop the UVC concept within the PLANETS project. Raymond van Diessen is responsible for extending the application of the UVC concept to preserve more complex objects.
Role of the National Library of the Netherlands
The National Library of the Netherlands (Koninklijke Bibliotheek, KB) played a major role in demonstrating that emulation based on the UVC concept is a viable option for long-term digital preservation.
In 2000, the emulation advocate, Jeff Rothenberg participated in a study with the KB to test and evaluate the feasibility of using emulation as a long-term preserving strategy. His method was to use software emulation to reproduce the behaviour of obsolete computing platforms on newer platforms offering a way of running a digital document’s original software in the far future, thereby recreating the content, behaviour, and ‘look and feel’ of the original document. Rothenberg was criticized for trying to preserve the wrong thing by suggesting to emulate the behavior of old hardware platforms and operating systems to access the original data through the original software program associated with it. Raymond A. Lorie recognized the difficulties in trying to create a program to emulate a 'real' machine on a future platform and realised that this approach was overkill for the purpose of preserving digital objects. Instead he introduced a novel approach of data/program archiving using a ‘Universal Virtual Computer’. The concept of the UVC-based preservation strategy was implemented by the KB and tested on PDF files as part of a KB/IBM ‘Long Term Preservation’ (LTP) study. Creating a UVC for PDF documents is more complex. Instead the KB decided on developing a UVC for images because this approach would also cover PDF documents (a PDF file can easily be converted to a series of images). The UVC-based approach resulted in the UVC as one of the permanent access tools for JPEG/GIF87 images within the Preservation Subsystem of the KB’s e-Depot.
Following the successful implementation of the UVC, the KB has continued to develop their emulation strategy for long-term digital preservation by focusing on 'full' or hardware emulation. This approach delivered a durable x86 component-based computer emulator: Dioscuri, the first modular emulator for digital preservation.
The Universal Virtual Computer is part of a broader concept, called the UVC-based preservation method. This method allows digital objects (like text documents, spreadsheets, images, sound waves, etc.) to be reconstructed in its original appearance anytime in the future. The methods are programs written in the machine language of a Universal Virtual Computer (UVC). The UVC is completely independent of the architecture of the computer on which it runs.
The UVC itself is a program which contains a set of instructions rather than a physical computer.
It will run as a software application on a future platform. Because we do not know at this time which hardware is available in the future, the UVC must be created at the time we want to access a particular document from the repository. This UVC then forms the platform on which programs can run that have been specifically written for such UVC in the past. Creating an emulation program for the UVC in the future is much simpler than trying to emulate a 'real' machine.
= Application description
=
The method of a UVC-based preservation strategy differentiates between data archiving which does not require full emulation, and program archiving which does. For archiving data, the UVC is used to archive methods which interpret the stored data stream. The methods are programs written in the machine language of a Universal Virtual Computer (UVC). The UVC program is completely independent of the architecture of the computer on which it runs.
Data archiving
Data archiving reconstructs the 'look and feel' of the original file but not the functionality of the original format. If the electronic form of the document is only used for compact storage or if the way the document looks to the human eye is all there is, then it suffices to archive the document as an image. If additional functionality is needed, such as text searching, storing only the image is not enough. In this case the text also needs to be archived along with the image of the document.
By restoring the original appearance of a file as an image a future user can see what the original file looks like in page layout, style, font etc. The text itself needs to be exported i.e. in ASCII format and can be saved as a sequence of homogeneous elements (all presentation attributes like font, size, etc. are the same for all characters) because the page image shows the exact look of the page. In this case the UVC program of the data has two parts, one to decode the text and one to decode the image.
What it entails
The data contained in the bit stream is stored with an internal representation, extracted from the data stream, of logical data elements that obey a certain schema in a certain data model. A decoding algorithm (method) extracts the various data elements from the internal representation and returns them tagged according to the schema. An additional schema (schema to read schemas) with information of the schema is similarly stored with the data together with a method to decode the schema to read schemas.
= Logical Data View =
The logical data model is kept simple in order to minimize the amount of description accompanying the data and to decrease the difficulty of understanding the structure of the data. The data model chosen for the UVC-based preservation method linearizes the data elements into a hierarchy of tagged elements organized using a XML-like approach.
The tagged data elements are extracted from the data stream of the digital file. A tag specifies the role that the data element plays in the data structure. The element tags hold the specific information about the content of the data in a technology-independent manner. Furthermore, the data elements tagged according to the schema are returned to the client in a Logical Data View (LDV)
Example of Logical Data View
sugarloaf
mountains
Kata Kunci Pencarian: