2007/04/19: DigCCurr 2007: What do digital curators need to know?
Each concurrent session time has a different theme. The theme of session one is "What do digital curators do and what do they need to know? I went to the "Research Perspectives" session. I was a bit disappointed because I was expecting to hear about hot research questions but the discussion was interesting nonetheless.Speakers: Hans Hoffmann, Phil Eppard, David Giaretta.
The speakers described their associated projects. Hoffman is involved with the Planets project, Eppard described the work of InterPARES 1 and 2. Giaretta, manager of the CASPAR project, spoke more of what digital curators need to know.
Most of the detail that Hoffmann and Eppard discussed I'm familiar with from reading about the projects over the years.
Giaretta was incredibly amusing. What follows are my "raw dump" notes. I'll have to summarize and comment at some point, but fwiw, here are my notes.
Concurrent Session - Funder's perspective
Hans Hoffman. Described the PLANETS projects. An European initiative.
Components: planning services, characterization services.
Interdependencies between all of the components.
Preservation planning. Come up with a process to identify what should be done with the digital object's for which you are responsible. Criteria for preservation based upon organizational policies, collection profile, provenance of digital objects (authenticity).
What are the best available preservation action given the criteria? Develop a plan. Ideal is to make it an automated process. Requirements should be proactive rather than reactive.
Preservation policy, content profile, usage profile, and actions inform the plan.
Plan will be executed on the content of your repository.
Characterization of objects can take two approaches. Intellectual approach, building objectives trees based upon utility analysis and extraction of intrinsic file (format) information.
They are trying to develop a description language to match the two approaches.
TNA PRONOM file-format identification used to define a characteristic language, define an extraction language, define a pluggable interpreter.
Preservation actions: two approaches: transform content/objects and transform environments (migration, emulation). Content objects: wrap third party transformation tools, ...preserve relational databases.
Testbed environment will help them determine what works. Developing a corpora of objects. Performing experiments on it.
The testbed consists of: data storage, hardware, PLANETS software, testbed software...
Interoperability framework.
What do digital curators need to know? Preservation planning, how to identify what criteria should inform decisions, how to apply that criteria to digital objects, how to test and evaluate available preservation strategies with respect to a given type of objects. How to do it in a effective and efficient way.
Training programs based on Planets results. Coming up with a modular approach to bring together course materials already in existence and building on the work of ERPAnet?
http://www. planets-project.eu
Questions: More about the criteria for judging which approach is best? You need to know what your collection is about, what are the characteristics of the digital objects? When you use migration, for instance, will it be the best solution? Emulation? The authenticity requirements for the document or record are needed and they are based on the business requirements of the collection and the context of creation.
Could you envision a situation where the requirements would come from the context of use? Hoffmann: Yes, that's why we're doing user studies. How are they using the digital objects? What will it tell us about the need to preserve digital objects.
If you have multiple users going after the objects in different ways would there be different criteria for different needs? Hans Hoffmann -- you have to deal with the object how you receive it. The object and how you use it are two different things. Add services to the repository based on the use. You try and evaluate and revise.
Phil Eppard on InterPARES 2 Project.
Many InterPARES researchers here in the room. Investigating the complex issues in the preservation of digital materials. InterPARES has a very long history. PE provided an overview of history and scope and some of the InterPARES products.
Started at UBC with authenticity of records project 1994-97, concerned with creation and maintenance of records in their active phase. Product of that research was DoD electronic records standard.
1999-2001 InterPARES 1. 13 countries, 4 continents, 60 researchers. Included practioners and experts in c.s., law, and policy studies. Focus was on records as defined by archival science. Theoretical principles based on archival theory and diplomatics (the study of creating and identifying authentic records).
Used case studies. Through the case studies used a template for analysis developed via diplomatics. Key product was two sets of activity models for the functions of selection and preservation functions and a framework for assessing and maintaining authenticity. Benchmark requirements supporting the presumption of authenticity and baseline requirements supporting the production of authentic copies of electronic records.
Not preserving the records themselves so much but the ability to reproduce the records in an authentic form.
Benchmark requirements: maintain expression of record attributes relating to identity and integrity, control access privileges, protective procedures to prevent loss of corruption of records, procedures to prevent media degradation, procedures for maintaining documentation.
A preserver looking to take over a set of electronic records would test them against the benchmark and this may influence an appraisal decision.
Baseline requirements: maintain controls over records transfer, maintenance and reproduction, retain documentation of reproduction process and its effects, capture ...?
2002-2006 InterPARES 2
Expanded interdisciplinary team adding researchers from various sectors of the arts and sciences to the team of archivists, preservationists, etc.
Focused on newer types of electronic records: dynamic, interactive, experiential.
Develop understanding of their creation, maintenance, and preservation.
Research domains: records creation & maintenance, authenticity, accuracy and reliability & methods of appraisal and preservation.
Focus areas: arts activities, scientific research activities, and e-government.
Cross-domain research groups: description (metadata), modeling, policy, and terminology.
Created a dictionary of terminology available to the public as a database.
Key products: manage chain of preservation model (preserver centered), business driven record keeping model (records creators, business centered), principles for records creators and preservers (for policy development rather than principles of preservation), guidelines for digital records preservation (operationalizing process for practitioners), guidelines for individuals, Metadata and Archival Description Registry and Analysis System (MADRAS), terminology database.
MADRAS is a key product. A web-based tool for developing registering and evaluation metadata schemas and archival description standards. It allows people to compare schemas as to how well they meet international standards and guidelines (such as the benchmark requirements).
InterPARES and Digital Curation: training new researchers and educators, case study methodology and examples, integrating preservation with other processes, metadata schema and analysis, policy recommendations.
Question: will you offer counseling to universities who want to use your methodologies?
PE: InterPARES 3 selected effort to work directly with repositories to test and implement some of the products of previous InterPARES work.
David Giaretta, CASPAR Project manager
CASPAR = Cultural, Artistic, and Scientific knowledge for Preservation, Access, and Retrieval.
What digital curators do: Struggle with: funders (reluctant to provide long-term commitment; cost control, cost estimates), Information provides (unwilling to provide what is needed, ways to capture required info), Users (increasingly demanding).
CASPAR a large consortium
http://www.casparpreserves.eu
What do digital curators need to know? They do preservation and publication/access but do not confuse them.
Needs of access: responsive, sophisticated search techniques, users often familiar with the material.
Needs of preservation: ensure the information trapped in the bits is authentic and understandable -- to the designated community (this also implies making it fit for the purpose, adding the info).
Disincentives for preservation: Cost, Time.
Can sell preservation as benefiting access. Cyber-infrastructure allow users to find and try to use data from many sources. Some of these will be familiar but most will be unfamiliar. How can one be sure that the unfamiliar data is used correctly?
Need understanding: garbage in, garbage out.
Digital preservation is terribly easy to do.... as long as you can provide money forever. Easy to test claims about tools...as long as you live a long time.
Know what is being preserved: the great data/document divide. Need to preserve information & knowledge -- not just "the bits." Documents, videos are rendered -- simple? Data must be processed in new ways ... this is harder.
Information is the important thing. What information? documents, data. Original bits? Look and feel? Behavior? Performance? Explicit/Implicit/Tacit.
Things change/disappear -- how can we ensure that the information trapped in the "bits" remains understandable despite all these changes? Example of Google changing a style sheet and messing up the RSS. The network links to related information may be important.
Time is short. Neither you or your institution will last forever. The chain of preservation is only as strong as its weakest link. Need to be prepared to hand over responsibility for the preservation.
No repository is an island. Your organization can not do everything. Must tap into other resources -- how can we find them and evaluate those resources.
We can not foretell the future. Need to manage knowledge to keep archives alive thorough time. Preservation is a process not a one time event. Preservation is expensive.
OAIS. Know more than the functional model diagram. The information model is key. With data especially you need to know the semantics (context).
Authenticity - evidence, evidence, evidence.
Support infrastructure: registries of representation information, representation information gap manager, orchestration manager, toolkits (representation information; preservation description information).
CASPAR aims to produce tools and techniques to support digital preservation and make it easier to share the cost. Must be relatively easy to use, must have a low "buy-in" in terms of effort required for adoption, must avoid requiring wholesale change of everyone else's systems, must be decentralized and reproducible so that it can live.
How can you tell you is selling preservation snake oil?
How to decide? Validation: demonstrate theoretical basis. Accelerated lifetime tests (changes in hardware, environment, and changes in designated community). Demonstrate increased trustworthiness, measured using Certification process as/when available.
http://wiki.digitalrepositoryauditandcertification.org (NARA work to produce ISO standard development)
Question:
One problem with OAIS is defining the designated community. What do you do when your archive, under law, has to serve everybody? Answer: State assumptions of what your community should already know in order to use .
Anne Gilliland asked what type of skills they expect people taking these positions to have? Eppard: Management skills and people skills. Gilliland: it goes back to developing the curriculum.
Marchionni (from his notes)- people need to know about the different models of preservation, need to know about their communities and how to monitor changes within it, know about appraisal as a continuous process rather than a discrete event, know about decision making process and they need to know about fund raising
Hoffmann - It's context related. I work in archives. Libraries may require a different understanding. Tools are applied differently in different contexts.
Labels: DigCCurr 2007

0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home