Electrical and Computer Engineering Faculty Publications and Presentations

Text Degradations and OCR Training

Document Type

Conference Proceeding

Publication Date

2005

DOI

https://doi.org/10.1109/ICDAR.2005.226

Abstract

Printing and scanning of text documents introduces degradations to the characters which can be modeled. Interestingly, certain combinations of the parameters that govern the degradations introduced by the printing and scanning process affect characters in such a way that the degraded characters have a similar appearance, while other degradations leave the characters with an appearance that is very different. It is well known that (generally speaking) a test set that more closely matches a training set will be recognized with higher accuracy than one that matches the training set less well. Likewise, classifiers tend to perform better on data sets that have lower variance. This paper explores an analytical method that uses a formal printer/scanner degradation model to identify the similarity between groups of degraded characters. This similarity is shown to improve the recognition accuracy of a classifier through model directed choice of training set data.

Copyright Statement

This document was originally published by IEEE in Eighth International Conference on Document Analysis and Recognition, 2005. Proceedings. Copyright restrictions may apply. DOI: 10.1109/ICDAR.2005.226

Publication Information

Barney Smith, Elisa H. and Andersen, Tim. (2005). "Text Degradations and OCR Training". Eighth International Conference on Document Analysis and Recognition, 2005. Proceedings, 2834-838.

Download

Find in your library

Included in

Computer Engineering Commons, Electrical and Computer Engineering Commons

COinS

Electrical and Computer Engineering Faculty Publications and Presentations

Text Degradations and OCR Training

Document Type

Publication Date

DOI

Abstract

Copyright Statement

Publication Information

Included in

Browse

Links

Search

Author Corner

Electrical and Computer Engineering Faculty Publications and Presentations

Text Degradations and OCR Training

Authors

Document Type

Publication Date

DOI

Abstract

Copyright Statement

Publication Information

Included in

Share

Browse

Links

Search

Author Corner