Document Analysis Issues in Reading Optical Scan Ballots

Document Type

Conference Proceeding

Publication Date





Optical scan voting is considered by many to be the most trustworthy option for conducting elections because it provides an independently verifiable record of each voter’s intent. While op-scan technology has been in use for decades, attempts to improve the machine-reading of ballots raise a range of interesting issues in document image analysis. Work thus far has been hindered by a lack of real-world data, however, since ballots associated with actual elections are kept secure from the public and normally destroyed after a period time. Fortunately, as a result of a recent challenged federal election in Minnesota, a large number of op-scan ballot images were made available for public inspection on the Web.

In this paper, we present the Minnesota op-scan ballot collection as a unique resource to the document analysis community. We discuss important considerations regarding the definitions of a legal vote and a valid ballot which cannot be ignored for the purposes of technical expediency. Our efforts to annotate the collection are also described, including the development of a graphical tool for collecting ground-truth interpretations and the protocol now being employed. The collection, consisting of ballot images, file formats, and associated truth data for part of the set, is being made openly available to facilitate research in this important area.