September 15, 2010
“Don’t stop thinking about tomorrow” was Bill Clinton’s two-time campaign theme, quoting the famous Fleetwood Mac song, and for me it aptly describes digital preservation as well. The need to constantly think and act to ensure access over time to digital content makes it completely different from everything we know about preservation in the paper world.
One of the things I’ve noticed was high on the wish list of any preservation-minded person is the need for speed and scalability. In fact we are now discovering more frequently that institutions’ digital collections are no longer measured in thousands but rather in millions and even billions of assets. Digital collections and the digital world have triggered a huge change in the way we work and think and will have an even greater impact on the way we need to plan for the future. If in the past an archivist or librarian could know by heart his entire collection and help in describing and manage it in the digital world, due to the mass we need a more robust and automatic flow in all parts of our work.
To make sure Rosetta, the Ex Libris digital preservation solution, can answer these requirements, we carried out a series of extensive scalability tests that included testing the rate in which digital items can be ingested, the volume of records in the system, the rate at which items can be delivered and viewed, and more. The tests were done in a two step approach: firstly, by running the tests in our lab here in
I won’t tell you the whole story and the process (that you can find in the recently published white paper) just a short summary of the conclusions:
- When calculating ingest throughput rates, many variables should be considered (files size, format variety, etc.). Rosetta can ingest many files in a short amount of time.
- Computing power does make the difference, from dedicated server to virtual server, your results can vary very much
- We have still not encountered a maximum limit of records. The system was tested with more than 50 million records and showed no sign of reaching the maximum limit.
- It’s all about bottlenecks. Find the next one, fix it and move forward
In short, it was an interesting process of both learning and understanding the changing needs in the digital age. I invite all of you to read the white paper.
I started with one