2007/04/20: DigCCurr 2007: Concurrent session: Digital Curation in Practice
I attended "Science and Biomedical Data"Speakers: Milton Corn, Don Sawyer, Tyler Waters
Milton Corn "Archiving the Phenome"
Phenome - total mass of physical and mental facts known about you. It's coordinating genome info with patient info (date of birth, hair color, cholesterol , etc.). (.g. OSHA), State laws very, Need to maintain a paper record for preservation purposes under debate. Text can substitute for non-textual information (e.g. x-ray report)
2. Well-being of the patient. Diagnosis and prescription of new illness can be influenced by past history. Implies records needed for life-time of patient. NOT a legal requirement. Hard to assemble from distributed sources, argument for personal health record or "super" repositories.
3. Well-being of family/nation. Patient's health record in genomic era of value to family, and to the entire population. Secondary use of health records of value to health services research, public health. Implies preservation "forever."
How to archive?
Same problems as for all digital archives plus:
*multiple content owners per patient
*variation in software, hardware, data formats, ontologies etc.
*privacy issues -- HUGE issues
*ownership of data not always clear
*multiple media, text, graphics, images all included
*Not seen as a problem in the U.S. by AMIA, AMA, NARA, MLA, AHIMA, DHHS, AHA but modest discussion in U.K., Belgium, India, Australia)
Corn surveying current practice, results so far:
*DHHS: no response
*Large HMO: no response
*Hospitals and offices -- no archiving policy "we plan to keep forever", privacy safeguards for daily use, definitive record is mix of paper and electronic and may not include images or graphics, how to manage old date when EHR system is changed remains a problem . N.B. one practitioner said he erases colonoscopy videos after reading to prevent second guessing later by lawyers.
Summary: curation of clinical data
*not a problem now, at least it's not recognized yet
*will become a problem as soon as size, migration costs escalate esp. with imaging
*preservation by CIO may, in fact, work for solvent enterprises (hospitals, pharmacies, etc.) i.e. the public pays
*situation for office practices uncertain
*Can health care system conglomerate all health data for an individual? Unlikely unless patient is the custodian.
Don Sawyer "Digital Curation at the National Space Science Data Center"
Overview: NSSDC requirements and digital curation, NSSDC holdings and archival services,
NSSDC requirements:
*functions as the space science permanent data/metadata repository
*provides the space science community with data stewardship guidance and supported. Data made available to the research community by various repositories should be well documented in order to support independent usability via, for example, virtual observatory access
*NSSDC as a repository making unique data/metadata available must participate in Virtual Observatory development efforts to assist in the practical evolution of these concepts
NSSDC uses OAIS concepts
Data providers:
*NASA's Space Science Active Archives typically under written agreements (MOUs)
*Space Science Space Flight Projects
Users:
NASA Space Science Archives
Space Science Projects
Individual researchers
General public
NASA headquarters
Digital holdings: acquiring data for 40+ years, currently 47 TB, reaching 270TB by 2010, 1300+ experiment from 375 US and international spacecraft, over 4400 data collections (typically each with a large numbers of files)
NSSDC Archival information services
*permanent archive: long-term curation, uses AIP implementation, data may be repackaged and/or transformed to maintain accessibility and usability
*Second archive: data also held in another archive, NSSDC holdings may be AIP form, data may be repackaged and/or reversibly transformed
*Third archive...
Administration activities
External: MOUs with various active archives, respond to NASA HQ requests, monitor progress of SAMPEX resident archive (home after project ends)
Internal: Oversee maintenance and modernization of infrastructure including systems administration (e.g. low cost Linux), manage personnel and physical space, oversee refreshing of tapes in archive every 6 yrs or less, oversee migration of legacy data from 9trk/3480 tape archive into current media
Ingest activities
*Development: develop, maintain and enhance new AIP ingest software, enhance remote submission information package and AIP creation sofware (MPGA) to support non-linux platforms, large SIPs and reliable electronic delivery of SIPS
*Operations: identify current/expected missions, collections, research and organize information, populate data management database
Archival storage
*development: develop upgrades to AIP storage manager, develop provenance management system, develop integrated document management preservation system
*operations: manage media and AIPs for 3 service levels
Data management
*maintain descriptive information database to include photo searching & support automated ingest, revise database to normalize and streamline infrastructure, design and implement XML mark-up of metadata producing systems to enhance finding aids
*participate in appropriate registries in Space sciences (e.g. heliophysics virtual observatories)
*provide general request and access support
Preservation Planning Activities
*External: continue participation/leadership in standards activities, monitor technology trends, sponsor NASA-wide workshop on archiving and metadata standards, provide curation guidance regarding documentation, database reports etc.
Key staff roles and skills
*Curation scientists: PhD in space science discipline, extensive handling and analysis experience
*Information architect
*Systems engineers
*Database administrator
*Operations manager
*Archive Head: PhD in space science discipline
Conclusions:
*Need science discipline experts with curation training (curation scientists) for interacting with data providers, data users
*Need computer professionals with curation training, working with curation scientists, for development and operation of internal systems and to interact with similar personnel at data provider sites
*Desire data providers with 'preservation understanding' to assist with ingest.
Tyler Waters " To Stand the test of time" Report on workshop of the same name
(**ed. note presenter went incredibly fast and it was quite difficult to keep up, pardon the brevity of these raw notes in advance)
Workshop findings
*The ecology of digital data reflects a distributed array of stakeholders, institutional arrangements, and repositories with a variety of policies and practices
*The scale of the challenge regarding the stewardship of digital data requires that responsibilities be distributed across multiple entities and partnerships that engage institutions, disciplines and interdisciplinary domains
*Historically universities have played a leadership role in advancement of knowledge and shouldered substantial responsibility for the long term preservation of knowledge ... an expanded role for some research and academic libraries and universities along with other partners, in digital data stewardship
*data is distributed, heterogeneous
*stewardship involves both preservation and curation and should be throughout the research life cycle.
Workshop recommendations
*NSF should facilitate the establishment of a sustainable institutional framework for long-term stewardship of data. This framework should involve multiple stakeholders by:
*supporting the research and development required to understand, model,
*supporting training and educational programs to develop a new workforce in data science both within NSF and in cooperation with other agencies, and...
*developing, supporting, and promoting education efforts to effect ...??
Also
1. Fund projects that address issues concerning ingest, archiving, and reuse of data by multiple communities
2. Foster the training and development of a new workforce in data science
3. Support the develop of usable and useful tools
4. ??
5. include data management plans in the proposal submission process
6. NSF should encourage the development of data sharing policies for programs involving community data
URL for full report "To Stand the Test of Time - Long-term stewardship of digital data sets in science and engineering"
http://www.arl.org/bm~doc/digdatarpt.pdf
Question re: NSF funding models for data curation centers
*want proposals in domain science areas, usually funding for 5 years and can be renewed for another 5 years
Labels: DigCCurr 2007

0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home