Second Interim Narrative ReportDecember 1, 1999-May 31, 2000
The narrative below reports on progress made and lessons learned during the last six months of our project to digitize and preserve plant images. After meeting several challenges concerning model program development and staffing, we are now rapidly moving forward.
Model Program Development
One of our original goals was to develop a stand-alone image database system that could be downloaded from our site and installed on an institutionís server. Due to the overwhelming variety of server hardware and software configurations, though, the development of this system as originally designed is not feasible. What we have chosen to do instead is describe how an institution might use a commercial database program, such as Microsoft Access, to publish both static and dynamic content on the Web. By the completion of the project we will have the code, database, and accompanying documentation available for download on our site. The difference between this setup and our original goal is that each institution will have to customize the application for their server configuration.
Working with small to medium-sized databases indicated that Microsoft Access would provide the most economical and progressive software. Also, this software will provide engines for the development of a new database, or importation of existing data from an older system, and provide the software necessary for porting these databases to a Web environment.
We began by transferring existing information from the Gardenís image database into an Access database developed with the Access Wizard. The transfer was quick and efficient. The on-screen documentation provided all of the information necessary to develop a new database or import data into it. After the database was tested, it was used to publish static html pages, as well as an online, dynamic site developed through Active Server Pages. Both options worked as intended when they were tested on a browser.
The next step in this development will be to provide a test database online along with instructions on how to run the test. The documentation for this test system will also explain how to substitute the usersí own database for the test database. We believe that Microsoft Access has provided all the pieces necessary to develop a small to medium-sized database and quickly provide access to the information on the Web. The software works well, is cost effective, and is widely available. With the instructions and test data set developed through the Garden, we feel smaller projects without access to data processing help will be able to fully participate and make their information and images available quickly and easily.
Project staff are learning from doing the project, attending training seminars and conferences, communicating with colleagues, and consulting the rapidly advancing information sources in the digitization field. Staff changes that have occurred during this six-month period include: 1) Chris Freeland replaced Alan Tucker as Technical Coordinator, February 2, 2000. 2) Fred Keusenkothen was hired as full-time Imaging Technician, March 27, 2000.
Type specimens are those specimens upon which a certain unique name is based. This makes types the more valuable and important specimens in a collection. The Missouri Botanical Gardenís herbarium contains over 80,000 types. Type specimens are being imaged using the Kaiser Scando camera. The specimens are being imaged family by family and processing includes working from the original publication of the name to properly identify the kind of type and its validity. As of May 31, 2000, approximately 2,700 type specimens have been digitized including the Bignoniaceae types, worked on predominately by A. Gentry, and the Myrsinaceae, and Apocynaceae types, also tropical families. Furthermore, a variety of types from MesoAmerica have been scanned, and the protocol also calls for scanning any types that are included in incoming or outgoing loan material, especially to protect against loss of the material.
Specimen Imaging Process
The Web Group faced a dilemma in June of 1998 when the Leaf Microlumina digital camera we evaluated for photographing specimens was discontinued. We turned to the Kaiser Scando camera, which has worked exceptionally well. Each specimen is placed on a light stand and digitized by the Kaiser Scando at a resolution of 3400 x 4500 pixels, which results in a file size of 35 MB. The image is then imported into Adobe Photoshop and edited using a pre-defined set of actions to ensure a consistent look for all the specimens.
Before May 1, we were saving the images as a JPEG with an average file size of 400 KB, which is quite large for a web image. The advantage to having a large image was that we could give users better detail and more information with each image, but the disadvantage was that each image took a long time to download. We felt that this was a fair trade-off, but continued investigating compression and server software.
As of May 1, 2000, we have begun using MrSID compression to provide users with a high-quality, multi-resolution image that they can zoom in and out on. MrSID, which stands for Multi-resolution Seamless Image Database, is manufactured by LizardTech, Inc. and is an encoding software that can compress images up to a ratio of 50:1. This means that we can compress the specimen images from an original size of 35 MB down to 1.5 to 2 MB. The beauty of MrSID images is that as a user zooms in and out on each image to see greater detail, the browser only downloads the portion of the image needed to fill the browser window. So while these images are 3-5 times larger than the edited JPEGs, they load considerably faster because users are not downloading the entire image all at once.
In addition, we can now serve images that provide the user with much greater detail and clarity. Another benefit of MrSID is that users have the option of viewing the high quality image with a browser plug-in that they download and install, with a Java applet, or with just the browser. This means that someone using a very old browser without the MrSID plug-in can view the same high quality image as someone using a new browser with the plug-in.
As of May 31, 2000, a total of 4330 slides taken by Al Gentry had been duplicated and attached to the voucher record of their corresponding specimen on the Tropicos database. This automatically created image screens for each slide to serve as a link from the digital image to the specimen record and provide captions for the slide images. The data entry department using Gentryís field books has filled in any record that was incomplete or a "skeleton". Both the original and duplicate slides received computer generated labels that contained family, genus, and species name, and collector name and number. The original slides have been given to the Archivist and are waiting to be put in cold storage. The duplicate slides are being scanned and activated by the web group. They are working with the duplicate slides instead of the originals for two reasons. First, the originals have protective slide covers on them which would make batching difficult and second, in many cases, the company that duplicated the slides, AdStat, also improved the lighting and image quality, saving the web group time.
The Gentry slides that do not have voucher numbers (<2000) are currently being processed. The originals also received protective sleeves, and the original and duplicate are again being given matching barcodes to keep the information together. Any image not identified to at least genus has been removed, as well as any images that are too poor to have much scientific value or are virtual duplicates of other slides. Because these slides are not attached to specific specimens, an image screen that is attached to the barcode is being created for each individual slide. This will save time when trying to activate and retrieve the image. Processing of all the Gentry slides should be finished by the end of June 2000.
Slide Imaging Process
With the purchase of a Nikon Coolscan 2000 and an autofeeder, we are able to scan forty 35mm slides in approximately 40 minutes. The Nikon Coolscan digitizes each slide at a resolution of 1200 x 800 pixels, giving a file size of 35 MB. These images are then imported into Photoshop and edited using a pre-defined list of actions. The resulting image is a JPEG with a file size ranging between 30-50 KB. We have decided not to use MrSID for the slide collections because generally there is not as much detail in a slide image as there is in a type specimen.
Representative Type Specimens
Please click on the Specimens link to view representative type specimens from the Garden's herbarium in MrSID format.
All work on the Gentry project was to have been completed by November 1999, but due to several staffing changes that goal has not been met. We have digitized approximately 2,700 type specimens and approximately 3,400 of the Gentry slides and other live plant images. Nearly all of those slides have been digitized within the past two months after the addition of a full time Imaging Technician. Within the next two months we will be adding another full time Imaging Technician, so once he/she is fully trained we expect to be digitizing nearly 300 slides per week. By November 30, 2000, we anticipate having all of the Gentry slides completed and will have begun digitizing the Croat, Goldblatt, and Hodge collections.
Since the type specimens are the first group for which we are using MrSID compression, we continue to refine our specimen imaging protocols and procedures. Each new digitized specimen will be saved in MrSID format and with our current computer setup we can digitize 30 specimens in a day.
Future of the Missouri Botanical Garden Plant Sciences Digital Library Program.
Upon completion of the 1998 IMLS Project, we hope to move directly into the 2000 IMLS Project, during which we will build upon and expand the current project. In addition to the IMLS projects, we have the potential for several collaborative projects. We expect to begin a Mellon Project in January 2001, to digitize and preserve rare books in collaboration with the New York Botanical Garden Library. We are considering a collaborative project to place the orchid literature and images online. And we are communicating with Harvard University and others concerning a collaborative project to digitize more plant specimens, as well as full-text and images of the most useful plant sciences literature. We appreciate all that IMLS is doing to support our goal to create a comprehensive plant sciences digital library, and look forward to continuing a mutually beneficial relationship.