Research Forum Presentation: Automatic Metadata Extraction for Archival Description and Access

Authors
William Underwood, Georgia Tech Research Institute, Georgia Institute of Technology

SAA Presentation
SAA 2008 Presentation

Abstract
The objective of the research reported here is to develop techniques for automatically extracting metadata from electronic records that is necessary for automatically describing items, file units and records series and for supporting access to these records. Archival metadata and elements of descriptions include document type, date, author, addressee, and topic. The elements of documentary form are those elements of documents of the same type that do not change, or vary just slightly, from document to document. These include not just keywords or captions such as “MEMORANDUM FOR”, “FROM: “,and “SUBJECT:”, but semantic categories such as dates and person names, for example, “MEMORANDUM FOR ”. The methods developed are described via example. These include methods for: (1) annotating dates, persons names, location names, organization names, postal addresses, and other semantically relevant categories that appear in e-records, (2) recognizing the intellectual and physical elements of documentary form, (3) recognizing the documentary form of a record and extracting metadata by using a parser with grammars for documentary forms, (4) automatically generating item titles and scope and content notes from the metadata, and (5) automatically populating access point attributes such as personal name, geographic name and topics. It is illustrated how these results potentially support improved access to electronic records.