Wednesday, August 15, 2012

Paper

"Preservation isn't a goal in itself; the goal of preservation is use." Elizabeth Yakel
 
The Adaptation of Digital Presevation to Information Centers

When speaking with those outside of the information field about digital preservation, the question I am most often asked is “Why can’t you just save everything?”. The issue of digital preservation may initially appear to be that simple but with further investigation, one quickly realizes that it is a much more detailed topic with several issues. By understanding past conservation methods, current concerns and future threats, great strides can be made to ensure that long-term archiving is achieved.
To better understand why digital preservation is needed, it is important to understand the evolution of the process.  Approximately 500,000 clay tablets dating back to the Bronze Age are essentially our earliest archives. While the tablets were inadvertently saved, anyone who has handled a tablet can tell you that they are heavy, unwieldy, and take up a great deal of space. Several centuries later, a better option came along in the form of microfilm.

Before the digital revolution was conceived, microfilm is practical since it is nearly static. No special retrieval system, new machinery or technology are required to access the data, just light and a magnifying glass. With its standardized practice and proper storage, it can be expected to last for at least five hundred years. While waiting to be digitized, or even in place of digitization, microfilm can be used to protect information that is vulnerable to loss or damage. The low cost of the process is also a major selling point. While microfilm has many pluses, it also has several minuses. While the film can be read with the naked eye, a machine simplifies the work but also needs to be maintained.  Also, the user needs to physically visit the facility to access the information.
While microfilm is still heavily used, with the commercial availability of the cd-rom in the 1980s, a new option arrived.  With its size and cost benefits, it was thought that the cd-rom would replace microfilm.  An article written in December 1999 by Mr. Johnson proclaimed: Current scanning technology can place an entire county census record on a single CD-ROM. No more fussing with faulty microfilm readers, faded microfilms, and money hungry printers. Will this spell the end of microfilm? At its peak in the year 2000, based on its size and portability, the cd-rom was thought to be perfect. As we now know, the answer is a resounding no. The cd-rom format is no longer useful especially since there are now large information institutions stuck with discs that are basically unusable. Some laptops such as the MacBook Air no longer come equipped with a cd drive. While microfilm, long considered the standard in preservation, is still durable and used by many, with the boom of the technology age, digital preservation is on the rise and here to stay.
With digital preservation, the question of what to save, known as the selection process, is never ending. We are living in a digital age where instant gratification is the norm. With the internet and high speed connections, a wealth of information is available at our fingertips in the blink of an eye. The problem arises when we realize that even with unlimited funding, it is not possible to say everything.
How much of what we have saved or wish to save will be used? Much thought and testing are required to ensure a successful process. Through said process, one will learn quickly that mass preservation without testing on a smaller sample usually leads to large problems later.
According to a study conducted by Centre for Information Behavior and the Evaluation of Research (CIBER), an organization run by University College of London (UCL), libraries should also accept that much content will seldom or never be used, other than perhaps a place from which to bounce. When selecting information to be digitized, one has to consider the users of the materials as well as the custodians who maintain and distribute said information.  Having explicit requirements from both perspectives will balance the demands and provide better planning. While items that are frequently used are often considered the most, gray literature, or low-value materials that were never published commercially, offered in bookstores or unregistered and lacking ISBN numbers are usually endangered. Most of what appears on the web falls under this category. More attention needs to be placed on how end users actually use the content they are browsing for. That could lead more insight into the selection process and making an attempt to procure and protect files that will ultimately be used.
Anyone who has the task of preserving items for a library or other information facility quickly realizes that mass saving is not only impractical but nearly impossible. As Hedstrom mentioned in her article, the purpose of preservation is to protect information of enduring value for access by present and future generations.  Mechanisms that will enable users to establish the origin, provenance, and authenticity of digital documents require archives and libraries to preserve contextual and descriptive information in addition to the content of digital documents. Provenance, which was mentioned several times during our classes and conference in London, helps to solidify the integrity of the digital object. Knowing the creator of the object helps to ensure that the file has not been modified. With that being the case, would orphan works need to be identified before they can be preserved? This is an ongoing concern intertwined with copyright issues.
It is not possible to define all of the requirements related to digital preservation. There are, however a few keys points that must be addressed. The importance of digital preservation is to preserve the object over the lifetime of the system. This ensures that the data is stored indefinitely without loss and is known as reliability.
When we speak of preservation, it is important to remember the end user. How will they use the file? One way to do this is to deal with threats of obsolescence. The information needs to be preserved as the creators intended. Specific software and often the hardware as well needs to be used in order to access the files. Can save in a simplified format, however, that usually degrades the overall quality of the work.

The Institute of Electrical and Electronics Engineers (IEEE) defines interoperability as the ability of two or more systems or components to exchange and use information. With native digital information as well as digital video, software dependencies exist.  Software preservation becomes a problem when one tries to emulate an older version or migrate software to run on a new platform.  This brings us into digital formats. With the cost of digitizing information, one strong format should be chosen. Saving one object under many formats is not feasible due to space limitations and the expenses related to the process.

Storage cannot be ignored when speaking about preservation. The scalability of the job can help to ensure that if the technology evolves, there will be adequate space to allow for updates. If the collection is static, as if often the case with historical archives, the size will be fixed, since no new items will be added. Even if no new items are added, the components may need to be updated to relate to updated technology. This practice is known as supporting heterogeneity. While data updates can occur, they are uncommon because the objects that are preserved are meant to remain unchanged. The main changes are usually to add new objects and to update the formats.  The type of storage is also important. Cheap storage tends to use a large amount of energy, is apt to fail and can also use large amounts of water for maintaining a cool temperature.
The other benefit to digitizing works is with proper Digital Object Identifier System and metadata, it is quite easy to locate the information later. Two popular standardized methods have been adopted in terms of metadata: Open Archival Information System Reference Model (OAIS) and Preservation Metadata: Implementation Strategies (PREMIS).  Defined metadata standards help to support the integrity, authenticity, reliability and archiving standards.
With the rapid acceptance of digital technologies and growth of digital libraries, the growth supersedes the standardization process. Without a standard format and methodologies across the board, will the information need to be updated, reformatted or structured later? Future issues are likely to include storage facilities as well as the growing popularity for cloud storage. It works for personal accounts but will large facilities also be able to utilize this service?
For appropriate risk management to be effective, it must follow a trickle down approach.  While constantly monitoring and reviewing the information, the context must be established, risks identified, analyzed, evaluated then treated while cycling back to monitoring and reviewing. We must remember that all current methods of preservation have tradeoffs and must balance functionality, dependability, and cost based on current technologies and methods. With the rapid changes in technology and lack of funds for digitization, in the foreseeable future, information centers will be behind the curve.

Literary Reviews

London Literary Reviews

Selection for digital preservation by Michael Seadle

Mr. Seadle stressed the value of long-term archiving and access by breaking down three important criteria.
Even while having a criteria, value, endangered, standards and access, the problem of which format to use varies based on the media and has not been standardized. Even if a format begins digitally, software dependencies used to edit may render the file obsolete and unusable. Both the file and the software need to be saved together. While the file may be saved, access is not necessarily granted. Libraries with primarily paper documents have an easier chance of creating standards and procedures that would offer long-term survival chance. Libraries using mostly multimedia collections will have more complex copyright issues and ever changing technology to deal with.


Addressing Digital Preservation: Proposals for New Perspectives by Barateiro

The authors covered the importance of maintaining an object’s access over an extended period of time. While there are threats and vulnerabilities that exist such as changing hardware and software, it is important to protect the authenticity and integrity of the digital object. The standardization processes being created by the Open Archival Information System Reference Model (OAIS) and the Preservation Metadata: Implementation Strategies (PREMIS) are working to close the gap that existed for decades. Identifying and analyzing the risks associated helps to improve the digital preservation techniques that exist while learning the most appropriate time to apply them.


An introduction to digital convergence: libraries, archives, and museums in the information age by Paul F. Marty

Mr. Marty put together a tri-level approach to digitization as they relate to information organizations. While discussing the needs of the three institutions, the roles and responsibilities as they exist in our information age as well as the types of educational programming to prepare future professionals to handle the needs of libraries, archives and museums. Rather than viewing them the organizations individually, it is suggested that they appear as transparent as possible while transcending their traditional, functional boundaries. There will always be challenges but working as a unified effort helps to ease the struggle.


Digital Preservation: A Time Bomb for Digital Libraries by Margaret Hedstrom

As Ms. Hedstrom wrote in her article, libraries and archives must include preservation as one of their core functions. Once again, it is stressed that stable materials should be introduced at the beginning of the creation process. The author mentioned the absolutism and idealism of producing work and viewing them as permanent. The lack of established standards and the rabid speed in which digital works are created, technology changes, strains the preservation process. The preservation requirements vary based on the users of the digital materials as well as the custodians who maintain and distribute the information. Most importantly, if the materials are only preserved and not optimised through metadata, the end users will not receive the full benefits.

About me

I am a student at The Pratt Institute working towards my Master’s in Library and Information Science. This blog recaps my study abroad course in E-Publishing in London in partnership with UCL (University College of London). This site features a day-to-day journal with a breakdown of events, including the Bloomsbury Conference. The Literary Review will summarize five articles in relation to my Paper which is based on Selection, Preservation and Curation in a Digital World.
Enjoy!

Friday, July 6, 2012

Day 9 - Last day of Class

It truly felt as though these two weeks zipped by in a blink of an eye.  I have to admit to feeling very sad, emotional but also proud and pleased. I was definitely sad that the course was coming to an end, that I wouldn’t gather with this group again, but also proud of all that I had learned and accomplished within a two week time frame. After discussing our trip to Cambridge, we discussed the various publishers that we visited over the past two weeks.  We agreed that while they all had their strong points, some seemed to be more well rounded than others. For example, Sage and Berg do not have partnerships.  Also, ProQuest was the only publisher to consider libraries and librarians.

Joyce Ray, a former student of the course and lecturer at Johns Hopkins University, spoke about Digital Curation and Publishing.  She went over a nuts to bolts portrayal of what is entailed in the curation process.  She stressed the importance of organization from the very beginning of the process, including an item's provenance, which can be difficult since orphan works are prevalent and no idea where they actually originated.  Many steps have been taken in recent years to ensure that work is being digitized properly and the same way across the board. While selection was mentioned, it was not the main focus. The importance of data management was also covered. 

Other than Jonathan Bowen on day 2 of the Conference, she was the only one to directly mention visualization.  It is useful in helping people understand information by painting a picture.  In terms of usability, interface and the basics, it’s wonderful for linked data and helping to create relationships. Pratt has several classes on visualization.

Big data is hard to manage and surprisingly, to me anyway, small data is even more difficult.  There are usually four levels of data and it’s important to track back and keep up with the provenance, since special metadata is needed.  The other issue is storage.  At Purdue University, where Anthony's son Charles Watkinson is the director of the Purdue University Press, each researcher is given storage space and if they receive a grant, they are allotted even more space.  The University of Bath maintains researchers storage space for 10 years. After that, they will need to make arrangements to keep the information secure.  Lots can be lost if that does not happen.

Joyce also talked about the Open Archive Information Services (OAIS).  It’s a reference model dating back to 2003 where content, context and access are viewed at 3 individual bubbles that all overlap. In conjunction with the Trustred Repos Archives Checklist (TRAC), which has been adopted as the standard, all changes are documented and there is a great deal of clarity.  At the beginning of a data life cycle, the appraisal and selection process begins. With storage being expensive, one must realize they cannot save everything. 

The other point that was raised that had been on my mind was about storage. When Google started the Hathi trust with the University of Michigan (UofM), they often said that UofM shouldn’t worry about storage because it was all digital.  That was not the case. Not only where the tiff files so large that they considered saving them in the inferior jpeg format, but the way in which they were stored was also unacceptable.  Google has a history of using underground storage that has been labeled “cheap” that requires lots of water to keep it cool.  So it’s unstable and wasteful and dangerous to the environment. A consortium was created allowing several universities to split the storage, thereby saving the quality. As much as information specialists speak of the simplicity of the digitization process, little about it is simple. Following false or poorly thought out information can lead to a project's demise.

After Joyce's presentation, we were given the afternoon off to explore cultural sites of London. I spent the afternoon walking around Canary Wharf, Poplar and Hackney, reminiscing over my time in London 10 years ago. Quite a lot has changed in those areas. Old hangouts are missing due to the Olympics but not everything can remain the same. 

Canary Wharf

We then met up at Spaghetti House for a lovely dinner to celebrate the end of our course. It was the perfect end to share our stores of librarianship, great food and wine.






Thursday, July 5, 2012

Day 8 - Cambridge

To Cambridge we will go! After a slow start, we headed north in our own minibus to Cambridge. This was my second time in Cambridge. The first visit was ten years ago and all I could remember was going punting on the river.

Patriotic ProQuest

Our first stop was to ProQuest which was a particularly important visit for me since just before leaving New York, I learned that I would be the ProQuest Student Trainer at Pratt! Therefore, I’ll be teaching their database including RefWorks and Pratt will gain free access to the Proquest Database until August 2013. It’s a major plus for both myself and the school.  That aside, I think it would be unbiased to admit that everyone appeared to be impressed by their presentation. It was clear that they put quite a bit of detail into their process and it’s thorough and well crafted. It is a part of their Library Advocacy program, which includes the Discover More Corps.

This was the only publishing facility that allowed us to tour their office, which is the busiest outside of the US. It follows an open floor plan where the workers sit in the open and can easily share ideas. It was also nice seeing people carrying trays of tea, especially after hearing at OUP about how having group tea time makes for a more productive office.

ProQuest is in the middle of their Cultural Heritage in Partnership program which digitizes early European books before 1700. They are also working on the film holdings from the British Museum that were turned over to the British Library, damaged by war. The biggest part of the project is the expansion of the Early English Books Online (EEBO) project. It is the partnership with the Royal Library of Danish Royal Library, Copenhagen and the Biblioteca Nazionale Centrale di Firenze in Italy, National Library of the Netherlands and the Wellcome Library in London. More than 12,000 books have been digitized to date with 4 million pages being scanned per year.  The project includes inserts and specialist approach with full metadata. They even have works by Gallileo with his notes written in the margin. The works will be free in the home countries but paid beyond those boundaries.

Group at ProQuest

For the Queen’s Diamond Jubillee, ProQuest launched Queen Victoria’s diaries. They date back to her childhood and include work throughout her life even before she was Queen including handmade drawings.  141 volumes in partnership with the Bodleian and Royal Archives are included.
While ProQuest has a reputation for being strong with technology and historical databases, they are also building the Arts sector by creating full archives for the entertainment industry, featuring Billboard, Variety and Spin magazines.  They will be fully searchable and include the cover and ad space.  They’ll also be fully indexed. Overall, I think everyone was impressed by ProQuest’s presentation. They seemed to have given a great deal of thought into libraries, archives and building their brand.

Punts in Cambridge

After ProQuest, we headed to the Granta riverside pub for lunch with a view. Seeing the punts brought back a lot of memories. Anthony led us to Kings College Chapel which was breathtaking with its stained glass. Since Anthony attended Cambridge, he was well versed on the history of the colleges and able to point out details that we would have missed. He even showed us where he lived while a student.  

Patricia Aske at Pembroke College

We then arrived at Pembroke College Library where we spoke with librarian Patricia Aske. Patricia was very generous in her tour including and even let us touch rare books and showing us gifts to the college. As at Oxford, the colleges are all operated independently. The books can only be circulated to students of the college. Students receive long loan periods while most books loaned to faculty last for 1/2 day and have hefty fines.  She spoke of the challenges of what is often a limited budget. Some of the treasures the Victorian building hold are Lancelot Andrewes' bible and Ted Hughes' poems featuring animals.

Wednesday, July 4, 2012

Day 7 - Independence Day

Independence day was spent learning more about Open Access journals and security in e-publishing.

After recapping our day in Oxford, Anthony mentioned the following four points for Value Added Journals:
Investment
Organization 
Sustainability - Having everything in one big repository has not worked very well.  LOCKSS (Lots of Copies Keep Stuff Safe), CLOCKSS (Controlled LOCKSS), and Portico. e-Depot, was created by the National Library of the Netherlands, also known as the Koninklijke Bibliotheek, or the KB. It seems as though everyone is in a hurry to save things that are already available in print. As multimedia items such as video come into play, that becomes tricky since the machine needed to play it may be obsolete.
Selection - Librarians select but trade journals do as well.

This lead to the question of what is a book? These items are not books but have the same function as one.  They add the value of production, branding and selection. We also spoke about the lack of open source textbooks.  From a development standpoint, they are terribly expensive due to the amount of editorial content.  Perhaps it’s something that can work if they follow a non-profit model and take advantage of the functionality of the web, but it has not be attempted as of yet. We also briefly mentioned the rise of gray literature with Open Access Journals.

Graham Bell, the Chief Data Architect at Editeur, a standards organization, spoke about the behind the scenes approach to trade books and e-books.  He explained the low cost of ISBNs and how each country and region has their own process of obtaining and applying for them.  He also covered, from his company’s standpoint, why it is necessary to have a different number for each product as well as each format. Ruth Jones spoke a few days before about why she thinks the ISBN system is antiquated so it was interesting to hear a different point of view. It’s not that there is not a standard in the industry but that there are many of them!

Graham also went on to poll the class on our use of e-books and determined that we are ahead of the curve since a large number of us do most of their reading on e-readers such as the Nook and Kindle. More than 3 times the number is Americans are heavy users as compared to UK readers.
One thing he mentioned was the financial competition within organizations. E-books are considered big users of money, since they take some to develop, while print are the money makers. Those in print divisions are unhappy with money going towards e-literature since it has yet to show a return.

Throughout Graham’s entire presentation, the one thing that surprised me, and most of my classmates, was the fact that DRM has not been applied to music, especially that offered by the Apples iTunes store for the past 3-4 years.  These days, instead of being about Digital Rights Management, it's more about the enforcement. For example, the publishing industry is not worried about one user sharing their ebook with a friend. However, if that friend sells them in mass, with your watermark embedded in the book, then you have a problem.  Large scale exploitation will not be tolerated.  While making an ebook is not necessarily difficult, if the approach is lacking either content, structure or appearance, part of the triangle will fail. The last piece is metadata. Because if no one can find your book, it doesn’t matter if the other 2 pieces are stellar.  

Rhodri Jackson, the Senior Publisher of Law Journals and Oxford Open at Oxford University Press, spoke about some of the issues mentioned at the Conference last week, namely The Cost of Knowledge Protest against Elsevier and the Finch Report.  He also mentioned some of the competitors of OUP’s Open Access project such as PLoS ONE and SOAP.  
Most of the information he offered had been covered during our visit to Oxford and the conference but it was still nice to hear the perspective of someone in the industry.The one thing he did mention was what is the role of librarians in the OA model? As a whole, librarians have been quiet. The questions of how do libraries pay for OA content was asked. Is it through membership, institutional funds?



After lunch, we headed to SAGE Publishing to meet Martha Sedgwick, Senior Manager of Online Products, and Alicia Warren, Project Manager, to discuss their innovations in Publishing.  I was surprised when Martha stated that most people find their products via Google Search. While it’s possible to subscribe and search for full text directly via one of their platforms, 60-70% of their users find the product via free web search. With that high of a number, perhaps it makes sense not to spend the money creating a dedicated program.  They spend money instead on metadata and searchability to appear at the top of search programs. I couldn't help but get the feeling that they want to be known as the biggest. They compared themselves to Bloomsbury and OUP in a we’re bigger and better than they are. It was a turnoff.

Their main focus seems to be Innovation by spending R&D budget and investing in new data sets. They have expanded their online products team from 4-13 in the past year and created new product streams such as print digitization, OA journals (social science based) and realizing that data has a value. They use Persona Cards for market research. Different characters have stats such as age, user group. Sage also runs a program to track users usage on the computer for one hour. How do people use their program, how long, what do they do if they’re frustrated, etc are all tracked. They also spend time working with the National Institute of the Blind on accessibility.  Improving response times, workflow and archives for those with disabilities is important. They also use DOI and linked data to relate articles and keywords which are sharpened the more you search on their site.

Tuesday, July 3, 2012

Day 6 - Exploring Oxford

The trip to Oxford, our first as a group outside of London, was highly anticipated by all. Anthony set up a full day of events with a bit of free time scheduled at the end. Jokes are often made of American being a “Young Country”; however, it is not until you have set foot in Oxford and on the Oxford University campus that you understand how true that phrasing is.  With a history of teaching dating back to 1096, one quickly understands why.

Class approaching OUP

After our two hour ride from London, we met Anthony at the bus center and walked over to the Oxford University Press (“OUP”). Any thoughts of their offices being settled into old stone buildings was soon diminished as we stepping inside their light, airy offices.  Artifacts of printing presses and cabinets filled with OUP’s products are placed around the office. There was a cafe on the first floor and several seating areas with people having tea and meetings in the open.  The feeling of it being a warm place to gather and share ideas was proffered.  

Printing Press

At the OUP, we met with Claire Dowbekin, the Head of Library Relations and Communications, Global Academic Business at OUP.  She gave an overview of the OUP platform, including historical annotations. She stressed the Press’ determination to follow the mission statement.  I was most surprised, and impressed, by the Delegates of Press whose job it is to approve all publications before they are sold.  The group, which meets bimonthly, is invited into their position and hold it, with tenure, until they give it up.  They are expected to be of a high academic level and uphold the mission statement. From time to time, the Delegates reject certain works being published.

With a staff of more than 5,500, most of which are outside the UK, their goal is always towards the mission statement. Next, Claire Bebber, the Institutional Marketing Manager, spoke about the online reference program started in 2008 with the Museum, Libraries and Archives Association, now under the Art’s Council.  They proposed that if 90% of the public libraries signed up for the three programs that they would receive a deep discount.  98% signed up thereby saving more than £3million.  The common thread that appeared was to think of the staff, the end users.  From the meetings held with librarians, the printed materials were created.  From online and offline quizzes to as they put it “arts, music, people, works everything!" is available for free from any computer, at the library or at home, or from the user’s mobile device.  They give the users what they need, keep the message simple and experiment.  Claire stressed that the most important thing to do was to persevere.  

At the Bodleian, We received a brief, fifteen minutes long, but insightful tour of the building. We did not see one building which is undergoing renovations since its contents are offsite 30 miles away. However, we did see where the original building was held as well as where students would have their oral examinations. The exams were given orally, since paper was very expensive, making difficult to compare the students to one another.

We were also told by our guide that only 3 of the original books survived, one by Aristotle and 2 others also philosophically based as they were free from religious persecution.  Due to its Legal Deposit system, it receives more than 5,000 items weekly, which presents a major storage issue for the library. Signs are posted about the library declaring that nothing be removed. We were also not allowed to bring in anything big enough to carry out books. Our bags, purses, etc were placed into a locker on the first floor before we could proceed within the building.
 
Before being granted access to the library, new readers are required to agree to a formal declaration. This declaration was traditionally oral, but is now usually made by signing a letter to the same effect — ceremonies in which readers recite the declaration are still performed for those who wish to take them, these occur primarily at the start of the University's Michaelmas term. The English text of the declaration is as follows:


I hereby undertake not to remove from the Library, nor to mark, deface, or injure in any way, any volume, document or other object belonging to it or in its custody; not to bring into the Library, or kindle therein, any fire or flame, and not to smoke in the Library; and I promise to obey all rules of the Library.


After the requisite tour of the gift shop, we proceeded to a meeting with Clive Hurst, the Head of Rare Books at the Bodleian. He explained in great detail the curation and presentation of the Charles Dickens’ exhibit, which he was not allowed to show us directly since the exhibit was free to the public.  Since the library does not own all of the items, several pieces are on loan, not an easy feat since 2012 is the bicentenary of his birth, those items are in demand. Special features were put into place such as blowing up objects such as a note handwritten by Dickens as a child to a friend or using small magnets to hold posters to the wall. Even his wife Catherine’s cookbook received a spotlight.


I finished the day with a tour of New Gate College with a small group of students, led by Anthony. Since Anthony has a long history with Oxford, he was able to show us areas such as the garden, complete with a well maintained section of the City Walls, the dining hall which was quite stately, and provide us with a breakdown of the purpose, such as vicar, students housing, etc. Before heading to the bus, Deimosa and I toured the Ashmolean Museum, the oldest museum in Britain.