Building a Digital Collection Repository In-House

In 2014, the Patchogue-Medford Library decided to enhance, expand, and unify our digitization efforts. The result was Digital PML, a digital collection repository that we built from scratch and completely within our library walls.

For many years, we have been committed to digitizing our rare and eclectic local history materials to increase community access and to support genealogical and historical research. This includes our Flickr-based historic photograph collection, Records of Men from Patchogue and Vicinity Who Took Part in the World War, and the many items presented on the website of our Celia M. Hastings Local History Room.

DPML homepage

For digital collection repositories and online exhibits, libraries have many great free and open-source software options, including GreenStone, Fedora Commons, ResCarta, Omeka and DSpace to name just a few. We took these as a starting point for our design.

The Design

At its most basic and abstract, repository software for digitized materials has four essential functions:

  1. To store the resulting digital files on physical data storage media
  2. To associate and store metadata with the digital files
  3. To search the metadata
  4. To present the digital files over the web and on a patron's electronic device of choice

Map collection at DPML

With this in mind, we designed Digital PML to consist of four modular components: a file storage layer, a metadata layer, a search layer, and a presentation layer.

The Components

The file storage layer stores the digital files and, when requested, makes those files available to the presentation layer. For each digitized item, it houses an uncompressed TIFF file and, for public viewing, either a compressed, tiled, pyramidal, multiresolution TIFF file or a PDF file.

The metadata layer stores the metadata for each digitized item and, when requested, makes that metadata available to the search and presentation layers. We chose to design and implement a relational database for the metadata using MySQL. Once an item is scanned and the resulting digital files are stored in the file storage layer, metadata for the item is entered into the database using MySQL Workbench.

The search layer searches the metadata that is stored in the metadata layer. After an extensive examination of the available possibilities, we chose Apache's Java-based Solr for this task. Solr provides both classic Boolean search capabilities and the "Google search experience." It also gives us plenty of options for more search functionality in the future, including faceting, "more like this" suggestions, query completion, and spelling correction.

DPML Postcard

To present all of this over the web and on a patron's electronic device of choice, we implemented the presentation layer in XHTML, CSS, PHP, and RSS. We chose the IIPImage server and its HTML5-based IIPMooViewer client as the website's main image presentation software. We also deployed the Internet Archive's BookReader software in some cases for the traditional "turn the page" experience. Lastly, we gather and analyze user traffic data using Google Analytics.

TechSoup for Libraries Helps the Final Result

The final system took seven months from conception to launch, and our only monetary expenditure on software for the project was a money-saving installation of Adobe Acrobat provided by, and thanks to, the TechSoup technology donation program. We use Acrobat as our main optical character recognition (OCR) software to extract words from our scanned items. Secondarily, we also use it to consolidate multipage PDF files into one-page PDF files where necessary and to watermark PDF files.

Digital PML currently contains more than 400 digitized items and counting. And many exciting future digitization projects await us, offering many more opportunities to increase our community's access to our local history materials while preserving those materials for future generations.

To say the least, there's so much more to tell about the choices we made and experiences we had while designing and implementing Digital PML and about our ongoing experiences in populating it. If anyone would like to know more, please feel free to contact me!

More Resources

Gary Lutz is a Reference and Adult Services librarian at the Patchogue-Medford Library in Patchogue, New York. He can be reached at glutz@pmlib.org.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.