2008/05/14: Adventures in batch loading - or why I hate our ILS

I've been spending a lot of hours of the past two weeks doing a batch import of 5001 ebook records from Literature Online.

It's been quite the educational adventure for me. Although I've been a working librarian for 13 years, I've only been a "cataloger" for the past five. At my former job I only ever had to use cataloging, serials, and, rarely, the acquisitions modules of Innovative Interfaces Millennium. Within each module I only used some of the functions with any regularity.

When I signed on at MPOW, they accepted my caveat that I wasn't a Millennium maven. They were comfortable that I could RTFM, especially since our Innovative coordinator would be handling most of the Millennium sys-admin. I know enough to bootstrap myself. Which I've done. Painfully.

Given the small size of our team in the Metadata Services Group, I've had to take on some more complicated batch imports. I knew how to do a data exchange, no problem. What I didn't know was that globally editing a large bunch of bibliographic records would fill up the transaction file on the server and cause the.entire.system to crash. And I mean crash. No circulation check-outs, no back-end processing, nothing, nada, zip.

I learned this after doing a global update prior to going to a 2 hour meeting. Guess who got called out of the meeting? It was a bit hairy until I could locate our Millennium coordinator who saved the day by doing a manual back-up of the system.

Huh? A back-up? WTF? Apparently the only way to access a system menu option to clear the transaction file is during the back up dialog. That is stupid. I hope there is some technical reason for this because I think it should be possible to send a command to a server to clear a file without having to back-up (somebody please correct me if I'm way misinformed here). Of course, we don't have command line access to our server. Innovative keeps a tight grip on that type of thing. I can understand why, they probably don't want people to have enough rope with which to hang themselves. Whatever. When we migrate to a different ILS, which is inevitable (there's only two kinds of librarians. Those who've done a migration and those who will), I will insist that our requirements list include full shell access to the system. I know that Millennium lets one use regular expressions but I'm under the impression that access to that is still controlled from the GUI.

⟨rant⟩ We shouldn't let vendors have so much control over our systems. I recognize that there are situations where it's good for vendors to hold the reins (like small operations with no staff skilled to do the sys-admin). But there is an opportunity cost to the nimbleness of the library who relies on the vendor.⟨/rant⟩

One could do the global updating of records more quickly and easily with shell access and the right skill set. But, how many cataloging librarians are well versed in regex apart from the code4lib folk? Um, yah. Right.

MarcEdit came to my rescue, once again (I heart Terry Reese). Sorry Robert, I wanted to use MARC Magician but they were too slow sending me a password for a free trial.

Global updating via MarcEdit is rather painless, once you get the hang of it. Getting the hang of it took me a few hours of messing around, however. The real bitch was doing the data transfer. Word up to my fellow Millennium users - 'tis sometimes better to use Data Exchange natively in Millennium than use records transfer from within MarcEdit. It's the only way to easily make a review file of records transferred.

I had to do several batch imports/deletes before I got it right.

The #*$@!!% frustrating thing is that there were some global updates that could only be done natively within Millennium. Each run filled up my transaction file ~30% . Three big globals a day and you crash. That sucks. What will we do when bigger bulk imports are needed? Five thousand records is nothing compared to the bulk ingests I foresee in our future (think GoogleBooks, etc.)

Naturally, I didn't want to keep interrupting our Innovative coordinator with requests to do a manual back-up. We have an automatic back-up each day at midnight. So each time the transaction file got 75-80% full I needed to stop for the day and await the magical file emptying before I could continue my learn-as-I-go batch work . Factor in that I had to each global a few times as I'd make newbie mistakes. You can understand why doing this took a few hours of my day for the past few weeks.

The final insult is that if you fill the transaction file in the midst of doing a global, the records which aren't yet updated will freeze. Twice I had this happen. Innovative only allows one to "free records in use" individually. A batch free-ing must be requested via their support ticketing system. And it may take them a day or two to do it. FRUSTRATING!!

There has GOT to be a better way. Really.

Labels: , , , , ,

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home