Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. While periods are permissible in "base" filenames, it is highly recommended that they be avoided.
    Rationale: Some programs assume that there is only a single period in a filename, and will behave strangely if multiple periods are present.
  2. It is preferable that all letters in a filename be lowercase. If a filename includes consecutive human-readable words, they may be denoted by CamelCase (e.g., wnp-04-RoyalSociety-ncn-t123.tif). This is expected to be relatively rare, though.
    Rationale: Lowercase letters aid human readability and make it easier to type the filename. In collections where filenames contain many human-readable words, CamelCase aids readability.
  3. Portions of the filename should indicate more specific detail as they are read from left to right. That is, the far left portion of the filename should indicate the collection name or home library of the item, the next portion should indicate the subcollection or aggregation, followed by a piece (page/section) number, and ending with the indication of derivative size. (Any of these portions that do not apply to the current file may be omitted.)  What if there is more than one home library? Or more than one home organization, e.g. some Sloan Working Papers are also part of the SCISR publication series?
    Rationale: Alphabetical listings of files make more sense with this organization.
  4. Distinct portions of the filename should be separated by underscores.
    Rationale: Separating the portions makes the filename both easier to read and easier to process automatically. Note that it is reasonable for the "identifier" portion of the filename to retain hyphens in identifiers from external sources, as in ihs-SHMU_01_13-01-05.tif. This reduces confusion when locating items provided by other institutions.
  5. File names should be limited to 31 characters or fewer (including the period and file extension).  Total path length (directories + file name) should not exceed 256 characters. (BCR-CDP; controlled vocabulary.com)
  6. While it is permissible for two different collections to contain files with identical names, this should be avoided.
    Rationale: It will not be possible to know of all filenames in use. Nonetheless, identical names can be confusing, and care should be taken to reduce the probability of identical names.
  7. Page numbers should be padded with leading zeros so that all filenames in a collection have the same number of characters for the page number portion. In most cases, this will be two or three digits.  When determining the number of characters, consider how the collection might grow and the number of loose pieces, foldouts, or other physical elements that may amount to more than one image per page.
    Rationale: This forces pages to display in the correct order when listed alphabetically, and provides more visual consistency when scanning a long list of files.
  8. When creating filename standards for a new collection, the standards should be based on existing collections/objects with similar characteristics.
    Rationale: Minimizing the variability in filename standards eases both automatic and manual processing.
  9. Whenever possible, the digital object's "primary" identifier (the identifier appearing in the filenames) should correspond to an identifier in use for the original (physical) object, such as the official or unofficial collection name or Archives collection number. If the format of the primary identifier conflicts with the absolute filename requirements, appropriate changes should be made. If the format of the primary identifier conforms to the absolute filename requirements but violates best practices, it may be left intact. 
    Rationale: It should be easy to determine the relationship between digital files and physical objects. This is easier if the identifier in the filename is as similar as possible to the identifier associated with the physical object.
  10. For derivative files intended primarily for Web display, one consideration for naming is that images may need to be cited by users in order to retrieve other higher-quality versions.  If so, the derivative file name should contain enough descriptive or numerical meaning to allow for easy retrieval of the original or other digital versions. (NARA)
  11. Derivative names are based on Stellar: cp=class projection (ideal for Flickr), sv="screen view" (viewing and printing), tm="thumbnail"

Naming scheme

...

Collection

...

Aggregate

...

 

...

Piece

...

 

...

1st Part

...

ex.

...

2nd Part

...

ex.

...

3rd Part

...

ex.

...

RVC

...

Collection

...

Image #

...

001801

...

Archives

...

MC0025

...

Reels

...

Roll2

...

Sequential #

...

 

...

Collection Name

...

PFC (Perceptual Form of the City)

...

Notebooks

...

nb09

...

 

...

 

...

 

...

 

...

Report years

...

 

...

Report #

...

 

...

 

...

 

...

Report section

...

 

...

 

...

 

Sample filenames

Collection

Filename

Notes

Edgerton Collection

MC025_nb41_017.tif

MC025 is the collection number for the Edgerton Collection in the Institute Archives.  nb stands for notebook; nb41 stands for notebook #41.  017 stands for the 17th sequential image of notebook #41 (which may or may not be exactly page #17).  In this case, the source for the digital image is the notebook itself.

Edgerton Collection

MC025_nb41-mf_017.tif

As above, except that the source for the digital image is the microfilm of the notebook.  nb41-mf stands for the microfilm (mf) of notebook #41.  017 stands for the image of the 17th whole frame on the microfilm.

Edgerton Collection

MC025_nb41-mf-split_017.tif

As above, except that the digital images have been split and cropped so that they no longer represent a whole frame from the microfilm.  nb41-mf-split stands for split images made from the microfilm of notebook #41.  017 is the 17th sequential image.

Edgerton Collection

MC025_nb41-mf-split_017-tntm.jpg

As above, except this is a derivative file.  017-tn tm stands for the thumbnail (tntm) of the 17th image made from splitting digital images of the microfilm frames.  Rotch has used tm for thumbnail.

Edgerton Collection

MC025_nb41_017-tntm.jgp

A derivative file based on Example 1, at top of chart.  017-tn tm stands for the thumbnail (tntm) of the 17th image.  The source for the digital image is the original notebook.

Archives collection using box and folder numbers

MC###_b06_f021_003_003.tif

MC### stands for the collection number.  b06 stands for box #6.  f021 stands for folder #21.  003 stands for the third item in folder #21.  The last 003 stands for the page/image number from the third item in folder #21.

Multi-volume item from general collection

barker_TrAmSocStTr_v001_0001.tif

barker is the home library collection for this item.  TrAmSocStTr is an abbreviation of the title: Transactions of the American Society for Steel Treating.  v001 stands for volume 1, and 0001 stands for the first image of this volume.  The Aleph system number (00291693) would be in the metadata. 

Book (Off Campus Collection)

science_PriOfRel_0064.tif

science is (sort-of) the home library collection for this item stored in LSA.  PriOfRel represents the title: The Principle of Relativity.  0064 stands for the 64th image of the book.  The source for the digital image is the book itself. The Aleph system number (001020855) would be in the metadata. 

Book (Off Campus Collection)

science_PriOfRel_0064-tm.jpg

As above, except this is a derivative file.  0064-tm stands for the thumbnail (tm) of image 0064.

What is Engineering? Freshman Lecture Series

WIE77_n02_pt01.mj2
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="b7879d41585af8cb-d5cae336-42f64433-9bcc9b88-08ab3361dd2e459dab1c26a0"><ac:plain-text-body><![CDATA[WIE77_n02_pt02.mj2

WIE77 represents the home library collection: the What is Engineering freshman lecture series from 1977. n02 stands for the lecture number.  pt01 and pt02 stand for two Motion JPEG 2000 files, one for each part of the lecture.  (The original lecture was recorded on two tapes, parts 1 and 2, and this division has been maintained.)  [LIBCMS:Hypthetical example; these mj2 files do not exist]

]]></ac:plain-text-body></ac:structured-macro>

What is Engineering? Freshman Lecture Series

WIE78_n02_pt01.mj2
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="df612935821b797e-8f3b7fbf-4d9e45eb-989ca69a-e463182f7c932dff5fd1bd74"><ac:plain-text-body><![CDATA[WIE78_n02_pt02.mj2

WIE78 represents the home library collection for the What is Engineering? Freshman Lecture Series from 1978.  n02 stands for the lecture number.  pt01 and pt02 stand for two Motion JPEG 2000 files, one for each part of the lecture. [LIBCMS:Hypthetical example; these mj2 files do not exist]

]]></ac:plain-text-body></ac:structured-macro>

What is Engineering? Freshman Lecture Series

WIE78_n03_pt01.mj2
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="228e1b9b3a8590d6-19b87e15-4e024de1-83a5b3e5-b6fae4fe3bf14b7289a1d843"><ac:plain-text-body><![CDATA[WIE78_n03_pt02.mj2

WIE78 represents the home library collection for the What is Engineering? Freshman Lecture Series from 1978.  n03 stands for the lecture number. pt01 and pt02 stand for two Motion JPEG 2000 files, one for each part of the lecture. [LIBCMS:Hypthetical example; these mj2 files do not exist]

]]></ac:plain-text-body></ac:structured-macro>

Perceptual Form of the City - actual file name used

KL_000123_02_sv.tif

KL stands for Kepes-Lynch, the unofficial name of this collection.  000123 is a sequential id number for the piece.  02 represents the second image from this item.  sv stands for "screen view," the type of derivative.

  • sv = screen view
  • tr =  transcript
  • tm = thumbnail
  • cp = classroom projection

Perceptual Form of the City - proposed file name if we could do it over

PFC_Boston_123456.tif

PFC stands for Perceptual Form of the City (collection name).  Boston differentiates this subcollection from images from New York City and other locations.  123456 is the IRIS image number.

Project Whirlwind

MC1234_reel#_report#.pdf

MC1234 represents the Archives' collection number.  The smallest unit is pdf because the images are already bundled into pdfs by report number.

Sloan Working Papers

SWP_WP#_report#_page#
SWP_oclc#_report#_page#

Sloan_WP#_report#+version_page#
Sloan_year_report#+version_page#

Prefer "Sloan" to SWP because "Sloan" is the name of the DOME community in which the pdfs would reside?
Using OCLC number does not allow for chronological sorting. Four-digit calendar year would work.

CEEPR

CEEPR_0090_paper#

 

RLE

RLE_report#_section#_page#

 




CSAIL

CSAIL_wps_WP#_page#

wps = working paper series?
CSAIL is a community in DOME.

Barker books

BarkerBks_title_page#
Barker_fund/collection_title_page#

No DOME community yet for general library collections.  Use LIBRARIES?  Use library name?  (But what if BARKER folds into SCIENCE?)

 

 

 

...