ATLA Web Site

 

Standards for ATLA CDRI projects

Listed below are expected standards for ATLA CDRI projects.

1. Submission of Master and Access Images
       [note re. "born digital" materials]

2. Minimum Image Standards: Photographs, Maps, Graphic Materials,

3. Phase Two Supplementary Standards

4. Quality Control and Naming Conventions

5. Metadata Requirements

Additional background information and technical guidelines


1. ATLA CDRI requires submission of a master image and an access image.

Master Image

Access Image

  • Represents as closely as possible the information contained in the original

  • Uncompressed, unedited, high quality

  • Serves as long term source for derivative files

  • Can serve as surrogate for the original

  • Very large file size

  • Used for creating high quality print reproductions

  • Usually stored in the TIFF file format
  • Used in place of master image for general web access

  • Generally formatted for the resolution of average monitor

  • Reasonable file size for fast download time; does not require a fast network connection

  • Acceptable quality for general research

  • Compressed for speed of access

  • Usually stored in JPEG file format

return to top


2. Phase Two of ATLA CDRI will accept visual images that conform to the following minimum standards:

PHOTOGRAPHS

Master Image

Tonal depth:

8-bit grayscale for black and white / 24-bit or higher for color
Format:
TIFF
Compression:
Uncompressed
Spatial Resolution:
600 dpi optical resolution for prints and large format negatives; 2700 dpi optical resolution for slides and small format (e.g. 35mm) negatives

Access Image

Tonal depth:

8-bit grayscale/24-bit color
Format:
JPEG
Compression:
Depends; 7:1 - 10:1 for grayscale/10:1 - 20:1 for color
Spatial Resolution:
150 dpi

return to top

MAPS

Master Image

Tonal depth:

8-bit grayscale/24-bit or higher color
Format:
TIFF
Compression:
Uncompressed
Spatial Resolution:
at least 300 dpi optical resolution depending on scale

Access Image

Tonal depth:

8-bit grayscale/24-bit color
Format:
JPEG
Compression:
Depends; 20:1
Spatial Resolution:
150 dpi

return to top

GRAPHIC MATERIALS

Master Image

Tonal depth:

8-bit grayscale/24-bit color or greater
Format:
TIFF
Compression:
Uncompressed
Spatial Resolution:
at least 300 dpi depending on scale

Access Image

Tonal depth:

8-bit grayscale/24-bit color
Format:
JPEG
Compression:
Depends; 7:1 - 10:1 grayscale/10:1 - 20:1 color
Spatial Resolution:
150 dpi

3. Phase Two Supplementary Standards

Photographs: Digital Originals ("Born Digital" Images)

Digital cameras vary considerably in their capabilities and even the most technically advanced lag behind film cameras in their ability to capture detail. There is no consensus on just how many "pixels" are typically captured in a single frame of 35mm film, but CDRI participants should consider that film cameras in the hands of competent photographers will produce images that are on average 2-10 times better in quality than current digital cameras. Therefore, choosing to collect of images of some archival value using digital photography requires deliberation.

There are situations where digital photography will make possible projects that otherwise would have been unfeasible. Non-professional photographers may work unobtrusively to capture useful images in places where the paraphernalia and time investment for professional techniques are impractical. The photographer can usually see the results of her work instantly and re-shoot unsatisfactory material immediately. An academic may pick and choose the subjects that best suit his instructional or research purposes without having to rely on the understanding and interpretation of another.

For these reasons and many more, "born digital" photographic images will inevitably and legitimately become candidates for addition to digital archives. To the extent possible, the same principles are followed as with scanned images.

Master Image

Tonal Depth:

8-bit grayscale/24-bit color (Note: Color is expected unless there is a specific use-related reason for utilizing grayscale.)

Format:

TIFF*

Compression:

Uncompressed.

Spatial Resolution:

Maximum supported by the capture device, but not less than 1600 x 1200 pixels (e.g. approximately two megapixels minimum).

Access Image

Tonal Depth:

8-bit grayscale /24-bit color (Same as Master Image)

Format:

JPEG

Compression:

Depends. On average 10:1-20:1

Spatial Resolution:

Not less than 800 x 600 pixels.

* Participants may choose to collect the RAW CCD data for their own uses. However, the variations in that data make it unsuitable for a shared archive.

Encoded Text Files: HTML Encoding

Simple documents may be encoded using the Hypertext Markup Language (HTML) 4.01 specification (http://www.w3c.org/tr/html4/). Generally, this option is suitable for documents with a simple, non-hierarchical structure and little need to specifically identify internal components. A good test of the advisability of using HTML is whether simple Boolean keyword searching would prove a suitable means of retrieving information from them.

Encoded Text Files: XML Encoding

More complex documents or large collections of similar documents that require retrieve based on internal identification of specific data elements require encoding according to a suitable Extensible Markup Language (XML) Document Type Definition (DTD). The Text Encoding Initiative (TEI) Lite DTD (http://www.tei-c.org/Lite/) provides an excellent general purpose framework for narrative text. Specific genres, however, may benefit from the use of DTDs designed specifically for them. For example, archival finding aids should be encoded according to the Encoded Archival Description (EAD) specification (http://www.loc.gov/ead/). In general, the DTD chosen for a particular project should be congruent with the nature and application of the text being encoded.

Portable Document Format (PDF)

The Portable Document Format (PDF) specification has become a de facto international standard for electronic document interchange despite the fact that it is proprietary. PDF has many advantages as a means to consolidate and organize printed documents into a useable and portable digital format, particularly if the converted material consists primarily of page images from a multipage document. Therefore, PDF is an acceptable access format for converted data when it is congruent with the anticipated use of the material. PDF should not be utilized as the archival form of the converted data, and the limitations of a PDF file should be taken into consideration when evaluating its appropriateness for anticipated use. For example, a PDF file consisting exclusively of images of printed pages will not provide the ability to search the contents of the document, only the ability to read and print it.

return to top


4. Quality Control and Naming Conventions

Quality Control

A quality control program should be conducted throughout all phases of the digital conversion process. Inspection of final digital image files should be incorporated into your project workflow.

File Naming

Click on Institution Name Code, Collection Name Code, and File Name or Range on the forms below to see instructions on file naming conventions.

return to top


5. Metadata Requirements


Additional Background Material and Technical Guidelines

The library community has developed considerable common experience with digitization in the past few years. Model projects including American Memory at the Library of Congress (memory.loc.gov), the Making of America (moa.umdl.umich.edu/), and the Colorado Digital Project (http://coloradodigital.coalliance.org/) have provided the basis for establishing "best practice" doctrines for digitizing a wide variety of manuscript, print, and graphic originals. Organizations such as the Association of Research Libraries (arl.cni.org), the Digital Library Federation (www.clir.org/diglib/), and the Northeast Document Conservation Center (www.nedcc.org) have sponsored innumerable opportunities for librarians to obtain information and training in digitization techniques and managing digital collections.

Despite these developments, we are not yet at a point where we can point to established "standards" for the full spectrum of potential digitization projects. We can point to established process models and procedures for designing projects and to examples of "best practice" that fit the most common materials and circumstances. NEDCC's Handbook for Digital Projects: A Management Tool for Preservation and Access (www.nedcc.org/digital/dighome.htm) provides an extremely valuable reference detailing both the current state of the art and the requirements for planning and managing a digitization project. This work is highly recommended to all who are preparing a proposal to submit for funding under this program. The root decision-making process is well summarized in a decision tree diagram created by the Harvard University Library (http://preserve.harvard.edu/bibliographies/matrix.pdf).

Note, however, that even these documents focus on two-dimensional graphic material. Digital audio and full-motion video, which are well within the scope of this program, are not included. It remains difficult cover the technical requirements or even rules of thumb for a broad range of potential digital media. Therefore, rather than pre-defining basic technical specifications, applicants are required to detail their anticipated specifications in their proposal and explain why their choices are appropriate to the project proposed. Members of the ATLA/ATS Digital Projects Committee will be available to discuss questions regarding technical specifications and grant awards may include advisory notation regarding technical specifications.

Applicants must address all the following points that are applicable in describing the technical design of their project:

  • ORIGINALS: Provide a full description of the originals to be digitized, including their current format, condition, and use. Discuss any special considerations relevant to the digitization process and their impact on the result. For example, if the originals are too fragile to survive scanning, they may have to be filmed and scanned from the microfilm.
  • CONVERSION PROCESS: Describe the anticipated conversion process and the resulting data files. Discuss the chosen format for the converted files and its suitability to the project. It may be necessary to specify why a given format is chosen over an alternative
  • RESOLUTION: Discuss the "resolution" (see below) of the digital surrogates in relation to the original and explain why this choice is appropriate for the project. Identify any potential consequences of the choice. Discuss how the capture resolution will relate to the delivery resolution.
  • DATA MANAGEMENT: Describe the physical storage facility envisioned, including provisions for securing and preserving the data, and for migrating to enhanced storage media or converting to revised data formats in the future.
  • DELIVERY MECHANISMS: Describe how digitized material will be delivered to end-users. Address issues of interface design and implementation.
  • METADATA: Identify the metadata elements that will be recorded for each entity and describe how it will be stored and used. Include descriptions of authority control, use of standard thesauri, and classification schemes.
  • COST: In addition to the budget for the proposed project, discuss any other costs the institution will incur as a result of implementing the project. Discuss the benefit of the project to the institution in relation to the projected costs.

Issues of "Conventional Wisdom"

  1. "Resolution" is perhaps the thorniest issue for digitization projects. The parameters used to measure resolution vary by media, by the nature and state of the original, and by the method of capture. The natural tendency is to seek to capture the highest resolution digital copy of the original that is technically possible. That tendency is constrained by the expense and technical problems of capturing, storing, and delivering digital surrogates at maximum resolution.

    Therefore, most projects seek to capture material within a range of highest useful resolution. For example, for printed documents, 300 dpi is considered a reasonable low threshold for capture for purposes of display and printing. Six hundred dpi is the common midrange and is used when better quality output is required or OCR processing is planned. Twelve hundred dpi is the functional maximum for printed documents as long as the focus is exclusively on the content and not the artifactual properties of the original.

    Other properties of the original may enter into issues of resolution. Color is the chief example. Even with printed documents, which are logically bitonal (that is, each dot is either black or white), are usually found to scan more successfully for display and OCR if they are treated as grayscale images. Similar issues arise with audio files and full-motion video. However, the more the nuances of the original are captured, the larger the resulting datafiles become. Sometimes, pilot testing is required to determine the most successful balance for the anticipated use.

  2. Generally, open source or public domain file formats are preferred over proprietary formats. This is because the specifications for these formats are published and software developers do not need to pay license and royalty fees in order to use the file format. However, some proprietary formats become de facto standards because the format owners provide reader software at no charge and because the format has significant advantages. The best known format of this type is Portable Document Format (PDF) from Adobe systems. PDF documents are popular because the offer a high level of control over text formatting, help secure documents against piracy and tampering, and produces excellent printed copies.

    Choosing a proprietary file format requires careful consideration, especially about the long term viability of the digital files. If applicants choose proprietary formats for their documents, they must include a detailed explanation of its advantage for the project.

return to top


American Theological Library Association
http://www.atla.com/cdri/standards.html
Last Updated: July 26, 2002