2007/04/19: DigCCurr 2007: Mechanisms for Influencing Data Curation Practice
***ed. note -- it's 4ish in the afternoon, my brain is dead tired, and the speakers in this session are incredibly quiet and mumble-y. these notes may be more raw than the others. ***I attended "Designing & Implementing Repositories Across Institutional Boundaries"
Speakers: Mike Smorul, Bill Underwood, Richard Marciano
PAWN project
Michael Smorul
http://umiacs.umd.edu/research/adapt or Google ADAPT UMIACS
Problems facing ingestion
*reliable data transfer
*each producer/archive interaction is unique
*how the archive deals with each collection is unique as well
Distributed ingestion with PAWN
*multiple producing sites with different requirements
*separation of administrative responsibility
Components - showed network architecture diagram
Package work flow overview
1. create producer-archive agreement
2. client package template
3. create package based on template
4. once approved, packages can be archived
5. rejected packages can be held until rectified or deleted for resubmission
Custom roles
*actions in PAWN can be grouped together to create roles (modify items in a package, create users, etc.)
*default roles
**producer
**records manager
**archive manager
**global administrator
PAWN utilizes SRB from SDSC
Case study 15,000 CD-ROMs of LANDSAT data
Case study from SLAC @ Stanford, created specialized roles (records creator, records liason officer, records manager)
William Underwood. PERPOS (Presidential Electronic Records Pilot System)
*initial objective, R&D project, develop tools to support archivists in gaining intellectual and physical control of PC records from the administration of George H.W. Bush
*contents of 500+ hard drives
*included operating system and software applications as well as user-created files
*DOS and Windows 3.1
PERPOS
*developed a prototype system to support accession, arrangement, preservation, review and description of e-record series
*evolutionary prototyping
*system has been pilot tested by archivists at the Bush Presidential Library
*several record series have been systematically processed
*FOIA processing currently being Pilot tested
***found viruses in legacy data *** important to use virus checkers
Summary of research results and benefits
*supports both systematic and FOIA processing of presidential e-records
*provides an environment for experimental application of advanced information technologies to archival process
*document type identifier speeds up processing
*automatic description of items, file units, and record services enables earlier intellectual control of e-records.
*prototype access restriction checker
*knowledge acquisition reduces work required to apply access restriction checker to records of subsequent administrations
Richard Marciano, SDSC/UCSD
The perspectives of digital curators on building distributed repositories
Collaboration between digital curators and IT folks looking at how to make cost effective distributed repositories.
PAT = persistent archives testbed
2 yr NHPRC project, extended for 1 year
Project summary:
*participants were digital curators from libraries, archives,, historical socieities, scientific data environments, museums and IT researchers and staff
*main goal: design a distributed repository for electronic records management, demonstrate the management of various types of records with a common software infrastructure
*approach: each site choose an archival collection, set up access control and update permissions for their preservation environment independently of the other participants.
Presentation goals:
*comment: David Giaretta says "no repository is an island" ... PAT fits the archipelago model
*examine: lessons learned and skills needed by digital curators to automate archival functions (appraisal, accessioning, arrangement, description, preservation, and access of records), benefits achieved by using common infrastructure
PAT Community Grid
Local storage resources
||||
SDSC Archive
||||
MCAT Metadata catalog (Oracle), Shared preservation environment, Storage resource broker (SRB)
Unique contributions of digital curators to the infrastructure:
*Windows based SRB clients/servers
*Development of a Perl for Windows client library
*Bulk operations were developed, tested, and refined (registration, accessioning, metadata extraction from the records, metadata loading, validation of data movement into/out of the system/within the system)
*End-to-end work flows were developed (accessioning, replication)
*SRB bugs revealed: better reliability
*MCAT ported to mySQL (Oracle, DB2, Sybase, Informix)
*Development of a wiki for documentation
*Registration of filenames with unusual characters discovered and fixed
*Suggestions on ways to simplify governance issues tied to particular types of data management:
**need to express such policies as rules to be applied to the data management system.
**development of the next generation of data grid technology: iRODS (integrated Rule-Oriented Data System)
**Each preservation process is expressed as a set of micro-services (operations that can be performed using a remote storage system)
What Digital Curators Liked
*leverage common software and hardware
*use commodity storage hardware
*lower the cost of participation
*reduce the level of expertise required at each site
*focus on management of the archival collections and outsource the details of the archival repository
*automate the manipulation of collections to minimize the level of effort
Conclusions
*PAT suggests that sustainability is probably beyond the capability of most archival repositories (costs of tracking new types of technology, expertise to manage, costs of storage systems and databases)
*outsourcing of the management of records is feasible through use of data grid technology
*preservation environments can be assembled by creating regional community archival partnerships with university data centers (yes, there are still many political barriers)
*independence can be maintained
*service agreements for storage and preservation of archival e-records are needed
Labels: DigCCurr 2007

0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home