Principal Component Analysis and Bayesian Classifier Based Character Recognition
Type of Culminating Activity
Master of Science in Engineering, Electrical Engineering
Electrical and Computer Engineering
Dr. Gary Erickson
Extensive research has been done on character recognition using the Bayesian Classifier. This paper discusses another approach to character recognition that combines Principal Component Analysis (PCA) and the Bayesian Classifier. PCA extracts the unique information from the feature set of the characters and creates a fewer number of features called "principal components." Test results show that PCA not only reduces the feature set, but also improves the classification accuracy of the Bayesian Classifier.
Character recognition is performed on many types of documents, such as postal documents, office documents, and newspapers. In this study, I apply the PCA techniques to "real world" newspaper characters. The effort is to improve the classification accuracy of 378,000 newspaper characters by using PCA in addition to the Bayesian Classifier.
Classifying newspaper characters is quite a challenge. Characters are of many different fonts, sizes, and shapes. Nevertheless, a classification accuracy of 60%-70% is attained. Using Principal Component Analysis, the dataset size was reduced by 25% and the accuracy remained the same. For a given number of features, using Principal Component Analysis improved the classification accuracy by nearly 10%.
Gupta, Gopal Krishna, "Principal Component Analysis and Bayesian Classifier Based Character Recognition" (2003). Boise State University Theses and Dissertations. 493.