Information in historical datasets comes in many forms. We are working with a set of World War I era postcards that contain hand written text, some preprinted text, postage stamps and postmark/cancellation stamps. The postmarks are of considerable interest to collectors looking for images of samples they had not previously seen. The postmarks also provide information on the originating location of the card that complements the information in the address block.
The postmarks vary considerably with towns and dates, but also styles. The styles can be grouped into categories. A method for automatically extracting templates for each category of these postmark stamps is described. The problem is complicated by the high levels of degradation present in the cards. The approach uses a cascade of unsupervised learning steps separated with image cleaning. This introduces averaging steps, which reduces noise. It also provides a reduction in the number of comparisons between samples. While merges happen at each stage, the number of times merges are needed within each stage is reduced. The templates once extracted can be used to group the postmarks, and will contribute information about the postmark content to better separate the postmark from the paper and other interfering marks to extract further information about the postmarks and postcards.
This is an author-produced, peer-reviewed version of this article. The final, definitive version of this document can be found online at HIP '15: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing, published by the Association for Computing Machinery. Copyright restrictions may apply. doi: 10.1145/2809544.2809555
Barney Smith, Elisa H. and Fink, Gernot. (2015). "Template Generation from Postmarks Using Cascaded Unsupervised Learning". HIP '15: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing, 92-98. http://dx.doi.org/10.1145/2809544.2809555