Principal Component Analysis and Bayesian Classifier Based Character Recognition

Publication Date


Type of Culminating Activity


Degree Title

Master of Science in Engineering, Electrical Engineering


Electrical and Computer Engineering

Major Advisor

Dr. Gary Erickson


Extensive research has been done on character recognition using the Bayesian Classifier. This paper discusses another approach to character recognition that combines Principal Component Analysis (PCA) and the Bayesian Classifier. PCA extracts the unique information from the feature set of the characters and creates a fewer number of features called "principal components." Test results show that PCA not only reduces the feature set, but also improves the classification accuracy of the Bayesian Classifier.

Character recognition is performed on many types of documents, such as postal documents, office documents, and newspapers. In this study, I apply the PCA techniques to "real world" newspaper characters. The effort is to improve the classification accuracy of 378,000 newspaper characters by using PCA in addition to the Bayesian Classifier.

Classifying newspaper characters is quite a challenge. Characters are of many different fonts, sizes, and shapes. Nevertheless, a classification accuracy of 60%-70% is attained. Using Principal Component Analysis, the dataset size was reduced by 25% and the accuracy remained the same. For a given number of features, using Principal Component Analysis improved the classification accuracy by nearly 10%.

Files over 30MB may be slow to open. For best results, right-click and select "save as..."